Friday, June 5, 2015

Reinforcement Learning, Control, and 3D Visualization

Over the last few months I've spent a lot of time studying optimal control and reinforcement learning. Aside from reading, one of the best ways to learn about something is to do it yourself, which in this case means a lot of playing around with the well known algorithms, and for those I really like, including them into dlib, which is the subject of this post.  So far I've added two methods, the first, added in a previous dlib release was the well known least squares policy iteration reinforcement learning algorithm.  The second, and my favorite so far due to its practicality, is a tool for solving model predictive control problems.

There is a dlib example program that explains the new model predictive control tool in detail.  But the basic idea is that it takes as input a simple linear equation defining how some process evolves in time and then tells you what control input you should apply to make the process go into some user specified state.  For example, imagine you have an air vehicle with a rocket on it and you want it to hover at some specific location in the air.  You could use a model predictive controller to find out what direction to fire the rocket at each moment to get the desired outcome.  In fact, the dlib example program is just that.  It produces the following visualization where the vehicle is the black dot and you want it to hover at the green location.  The rocket thrust is shown as the red line:

Another fun new tool in dlib is the perspective_window.  It's a super easy to use tool for visualizing 3D point cloud data.  For instance, the included example program shows how to make this:

Finally, Patrick Snape contributed Python bindings for dlib's video tracker, so now you can use it from Python.  To try out these new tools download the newest dlib release.

Tuesday, February 3, 2015

Python Stuff and Real-Time Video Object Tracking

The new version of dlib is out today. As promised, there is now a full Python API for using dlib's state-of-the-art object pose estimation and learning tools.  You can see examples of this API here and here.  Thank Patrick Snape, one of the main developers of the menpo project, for this addition.

Also, I've added an implementation of the winning algorithm from last year's Visual Object Tracking Challenge.  This was a method described in the paper:
Danelljan, Martin, et al. "Accurate scale estimation for robust visual tracking." Proceedings of the British Machine Vision Conference BMVC. 2014.
You can see some videos showing dlib's implementation of this new tracker in action on youtube:

All these videos were processed by exactly the same piece of software.  No hand tweaking or any funny business.  The only required input (other than the raw video) is a bounding box on the first frame and then the tracker automatically follows whatever is inside the box after that.  The whole thing runs at over 150fps on my desktop.  You can see an example program showing how to use it here, or just go download the new dlib instead :)

I've also finally posted the paper I've been writing on dlib's structural SVM based training algorithm, which is the algorithm behind the easy to use object detector.

Saturday, December 20, 2014

Dlib 18.12 released

I just released the next version of dlib.  This time I added tools for computing 2D FFTs, Hough transforms, image skeletonizations, and also a simple and type safe API for calling C++ code from MATLAB.  Readers familiar with writing MATLAB mex functions know how much of a pain it is, but no longer!  Here is an example of a C++ function callable from MATLAB using dlib's new MATLAB binding API.  You can also compile it with CMake so building it is super easy. There is an example CMake file in the dlib/matlab folder showing how to set it up.  I also used this tool to give the MITIE project a simple MATLAB API. So you can see another example of how easy it is to set this up in the MITIE MATLAB example.  

There are also some fun new things in the pipe for the next dlib release (v18.13).  First, Patrick Snape, one of the main developers of the menpo project, is adding a Python interface to dlib's shape prediction tools. You can follow that over on dlib's github repo.  I'm also working on a single object tracker for OpenCV's Vision Challenge which I plan to include in the next version of dlib.

Saturday, November 15, 2014

Dlib 18.11 released

The new version of dlib is out. This release contains mostly minor bug fixes and usability improvements, with the notable exception of new routines for extracting local-binary-pattern features from images and improved tools for learning distance metrics. See the release notes for further information.

I also recently found out about two particularly interesting projects that use dlib.  The first is menpo, a Python library focused on computer vision which is being developed by a team at Imperial College London.  If you are interested in a Python library that pulls together a bunch of computer vision tools then definitely check it out.  The other interesting project is Ceemple, which is basically an interactive language shell for C++.  They have integrated a bunch of libraries like dlib and OpenCV into it with the general goal of making C++ development feel more rapid and interactive.  So think of something like MATLAB or IPython, but for C++.

Tuesday, October 21, 2014

MITIE v0.3 Released: Now with Java and R APIs

We just made the next release of MITIE, a new DARPA funded information extraction tool being created by our team at MIT. This release is relatively minor and just adds APIs for Java and R.  The project page on github explains how to get started using either of these APIs.  

I want to take some time and explain how the Java API is implemented since, as I discovered while making MITIE's Java API, there aren't clear instructions for doing this anywhere on the internet. So hopefully this little tutorial will help you if you decide to make a similar Java binding to a C++ library.  So to begin, let's think about the requirements for a good Java binding:
  • You should be able to compile it from source with a simple command
  • A user of your library should not need to edit or configure anything to compile the API
  • The compilation process should work on any platform
  • Writing JNI is awful so you shouldn't have to do that
This pretty much leads you to Swig and CMake which are both great tools.  However, finding out how to get CMake to work with Swig was painful and is pretty much what this blog post is about.  Happily, it's possible to do and results in a very clean and easy to use mechanism for creating Java APIs.  In particular, you can compile MITIE's Swig/CMake based Java API using the usual CMake commands:
mkdir build
cd build
cmake ..
cmake --build . --config Release --target install
That creates a jar file and shared library file which together form the MITIE Java API.  Let's run through a little example to see how you can define new Java APIs.  Imagine you have created a simple C++ API that looks like this:
void printSomeString (const std::string& message);

class MyClass {
    std::vector<std::string> getSomeStrings() const;
and you want to be able to use it from Java.  You just need to put this C++ API in a header file called swig_api.h and include some Swig commands that tell it what to call std::vector<std::string> in the generated Java API.  So the contents of swig_api.h would look like:
// Define some swig type maps that tell swig what to call various instantiations of
// std::vector.
#ifdef SWIG
%include "std_string.i"
%include "std_vector.i"
%template(StringVector)         std::vector<std::string>;

#include <string>
#include <vector>

void printSomeString (const std::string& message);

class MyClass {
    std::vector<std::string> getSomeStrings() const;
The next step is to create a CMakeLists.txt file that tells CMake how to compile your API.  In our case, it would look like:

cmake_minimum_required (VERSION 2.8.4)


# List the source files you want to compile into the Java API.  These contain 
# things like implementations of printSomeString() and whatever else you need.
set(source_files my_source.cpp another_source_file.cpp )

# List the folders that contain your header files
include_directories( . )

# List of libraries to link to.  For example, you might need to link to pthread
set(additional_link_libraries pthread)

# Tell CMake to put the compiled shared library and example.jar file into the
# same folder as this CMakeLists.txt file when the --target install option is
# executed. You can put any folder here, just give a path that is relative to
# the CMakeLists.txt file.
set(install_target_output_folder .)

That's it.  Now you can compile your Java API using CMake and you will get an example.jar and example.dll or file depending on your platform.  Then to use it you can write java code like this:
public class Example {
    public static void main(String args[]) {
        global.printSomeString("hello world!");

        MyClass obj = new MyClass();
        StringVector temp = obj.getSomeStrings();
        for (int i = 0; i < temp.size(); ++i)
and execute it via:
javac -classpath example.jar
java -classpath example.jar;. -Djava.library.path=. Example

assuming the examle.jar and shared library are in your current folder.  Note that Linux or OS X users will need to use a : as the classpath separator rather than ; as is required on Windows.  But that's it!  You just made a Java interface to your C++ library.  You might have noticed the include(cmake_swig_jni) statement though.  That is a bunch of CMake magic I had to write to make all this work, but work it does and on different platforms without trouble.  You can see a larger example of a Java to C++ binding in MITIE's github repo using this same setup.

Thursday, August 28, 2014

Real-Time Face Pose Estimation

I just posted the next version of dlib, v18.10, and it includes a number of new minor features.  The main addition in this release is an implementation of an excellent paper from this year's Computer Vision and Pattern Recognition Conference:
One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine Sullivan
As the name suggests, it allows you to perform face pose estimation very quickly. In particular, this means that if you give it an image of someone's face it will add this kind of annotation:

In fact, this is the output of dlib's new face landmarking example program on one of the images from the HELEN dataset.  To get an even better idea of how well this pose estimator works take a look at this video where it has been applied to each frame:

It doesn't just stop there though.  You can use this technique to make your own custom pose estimation models.  To see how, take a look at the example program for training these pose estimation models.

Thursday, July 10, 2014

MITIE v0.2 Released: Now includes Python and C++ APIs for named entity recognition and binary relation extraction

A few months ago I posted about MITIE, the new DARPA funded information extraction tool being created by our team at MIT. At the time it only provided English named entity recognition and sported a simple C API.  Since then we have been busy adding new features and today we released a new version of MITIE which adds a bunch of nice things, including:
  • Python and C++ APIs
  • Many example programs
  • 21 English binary relation extractors which identify pairs of entities with certain relations.  E.g. "PERSON BORN_IN PLACE"
  • Python, C, and C++ APIs for training your own named entity and binary relation extractors
You can get MITIE from its github page.  Then you can try out some of the new features in v0.2, one of which is binary relation extraction.  This means you can ask MITIE if two entities participate in some known relationship, for example, you can ask if a piece of text is making the claim that a person was born in a location.  I.e. Are the person and location entities participating in the "born in" relationship?

In particular, you could run MITIE over all the Wikipedia articles that mention Barack Obama and find each instance where someone made the claim that Barack Obama was born in some place.  I did this with MITIE and found the following:

  • 14 claims that Barack Obama was born in Hawaii
  • 5 claims that Barack Obama was born in the United States
  • 3 claims that Barack Obama was born in Kenya

Which is humorous.  One of them is the sentence:
You can still find sources of that type which still assert that "Barack Obama was born in Kenya"
When you read it in the broader context of the article it's clear that it's not claiming he was born in Kenya.  So this is a good example of why it's important to aggregate over many relation instances when using a relation extractor.  By aggregating many examples we can get reasonably accurate outputs in the face of these kinds of mistakes.  

However, what is even more entertaining than poking fun at American political dysfunction is MITIE's new API for creating your own entity and relation extractors.  We worked to make this very easy to use, and in particular, there are no parameters you need to mess with, everything is dealt with internal to MITIE.  All you, the user, need to do is give example data showing what you want MITIE to learn to detect and it takes care of the rest.  Moreover, in the spirit of easy to use APIs, we also added a new Python API that allows you to exercise all the functionality in MITIE via Python.  As a little example, here is how you use it to find named entities:
from mitie import *
ner = named_entity_extractor('MITIE-models/english/ner_model.dat')
tokens = tokenize("The MIT Information Extraction (MITIE) tool was created \
                   by Davis King, Michael Yee, and Wade Shen at the \
                   Massachusetts Institute of Technology.")
print tokens
This loads in the English named entity recognizer model that comes with MITIE and then tokenizes the sentence.  So the print statement produces 
['The', 'MIT', 'Information', 'Extraction', '(', 'MITIE', ')', 'tool', 'was', 'created', 'by', 'Davis', 'King', ',', 'Michael', 'Yee', ',', 'and', 'Wade', 'Shen', 'at', 'the', 'Massachusetts', 'Institute', 'of', 'Technology', '.']
Then to find the named entities we simply do
entities = ner.extract_entities(tokens)
print "Number of entities detected:", len(entities)
print "Entities found:", entities
Which prints:
Number of entities detected: 6
Entities found: [(xrange(1, 4), 'ORGANIZATION'), (xrange(5, 6), 'ORGANIZATION'), (xrange(11, 13), 'PERSON'), (xrange(14, 16), 'PERSON'), (xrange(18, 20), 'PERSON'), (xrange(22, 26), 'ORGANIZATION')]
So the output is just a list of ranges and labels.  Each range indicates which tokens are part of that entity.  To print these out in a nice list we would just do
for e in entities:
    range = e[0]
    tag = e[1]
    entity_text = " ".join(tokens[i] for i in range)
    print tag + ": " + entity_text
Which prints:
ORGANIZATION: MIT Information Extraction
PERSON: Davis King
PERSON: Michael Yee
PERSON: Wade Shen
ORGANIZATION: Massachusetts Institute of Technology