Thursday, August 28, 2014

Real-Time Face Pose Estimation

I just posted the next version of dlib, v18.10, and it includes a number of new minor features.  The main addition in this release is an implementation of an excellent paper from this year's Computer Vision and Pattern Recognition Conference:
One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine Sullivan
As the name suggests, it allows you to perform face pose estimation very quickly. In particular, this means that if you give it an image of someone's face it will add this kind of annotation:

In fact, this is the output of dlib's new face landmarking example program on one of the images from the HELEN dataset.  To get an even better idea of how well this pose estimator works take a look at this video where it has been applied to each frame:


It doesn't just stop there though.  You can use this technique to make your own custom pose estimation models.  To see how, take a look at the example program for training these pose estimation models.

Thursday, July 10, 2014

MITIE v0.2 Released: Now includes Python and C++ APIs for named entity recognition and binary relation extraction

A few months ago I posted about MITIE, the new DARPA funded information extraction tool being created by our team at MIT. At the time it only provided English named entity recognition and sported a simple C API.  Since then we have been busy adding new features and today we released a new version of MITIE which adds a bunch of nice things, including:
  • Python and C++ APIs
  • Many example programs
  • 21 English binary relation extractors which identify pairs of entities with certain relations.  E.g. "PERSON BORN_IN PLACE"
  • Python, C, and C++ APIs for training your own named entity and binary relation extractors
You can get MITIE from its github page.  Then you can try out some of the new features in v0.2, one of which is binary relation extraction.  This means you can ask MITIE if two entities participate in some known relationship, for example, you can ask if a piece of text is making the claim that a person was born in a location.  I.e. Are the person and location entities participating in the "born in" relationship?

In particular, you could run MITIE over all the Wikipedia articles that mention Barack Obama and find each instance where someone made the claim that Barack Obama was born in some place.  I did this with MITIE and found the following:

  • 14 claims that Barack Obama was born in Hawaii
  • 5 claims that Barack Obama was born in the United States
  • 3 claims that Barack Obama was born in Kenya

Which is humorous.  One of them is the sentence:
You can still find sources of that type which still assert that "Barack Obama was born in Kenya"
When you read it in the broader context of the article it's clear that it's not claiming he was born in Kenya.  So this is a good example of why it's important to aggregate over many relation instances when using a relation extractor.  By aggregating many examples we can get reasonably accurate outputs in the face of these kinds of mistakes.  

However, what is even more entertaining than poking fun at American political dysfunction is MITIE's new API for creating your own entity and relation extractors.  We worked to make this very easy to use, and in particular, there are no parameters you need to mess with, everything is dealt with internal to MITIE.  All you, the user, need to do is give example data showing what you want MITIE to learn to detect and it takes care of the rest.  Moreover, in the spirit of easy to use APIs, we also added a new Python API that allows you to exercise all the functionality in MITIE via Python.  As a little example, here is how you use it to find named entities:
from mitie import *
ner = named_entity_extractor('MITIE-models/english/ner_model.dat')
tokens = tokenize("The MIT Information Extraction (MITIE) tool was created \
                   by Davis King, Michael Yee, and Wade Shen at the \
                   Massachusetts Institute of Technology.")
print tokens
This loads in the English named entity recognizer model that comes with MITIE and then tokenizes the sentence.  So the print statement produces 
['The', 'MIT', 'Information', 'Extraction', '(', 'MITIE', ')', 'tool', 'was', 'created', 'by', 'Davis', 'King', ',', 'Michael', 'Yee', ',', 'and', 'Wade', 'Shen', 'at', 'the', 'Massachusetts', 'Institute', 'of', 'Technology', '.']
Then to find the named entities we simply do
entities = ner.extract_entities(tokens)
print "Number of entities detected:", len(entities)
print "Entities found:", entities
Which prints:
Number of entities detected: 6
Entities found: [(xrange(1, 4), 'ORGANIZATION'), (xrange(5, 6), 'ORGANIZATION'), (xrange(11, 13), 'PERSON'), (xrange(14, 16), 'PERSON'), (xrange(18, 20), 'PERSON'), (xrange(22, 26), 'ORGANIZATION')]
So the output is just a list of ranges and labels.  Each range indicates which tokens are part of that entity.  To print these out in a nice list we would just do
for e in entities:
    range = e[0]
    tag = e[1]
    entity_text = " ".join(tokens[i] for i in range)
    print tag + ": " + entity_text
Which prints:
ORGANIZATION: MIT Information Extraction
ORGANIZATION: MITIE
PERSON: Davis King
PERSON: Michael Yee
PERSON: Wade Shen
ORGANIZATION: Massachusetts Institute of Technology

Wednesday, April 9, 2014

Dlib 18.7 released: Make your own object detector in Python!

A while ago I boasted about how dlib's object detection tools are better than OpenCV's. However, one thing OpenCV had on dlib was a nice Python API, but no longer!  The new version of dlib is out and it includes a Python API for using and creating object detectors. What does this API look like? Well, lets start by imagining you want to detect faces in this image:


You would begin by importing dlib and scikit-image:
import dlib
from skimage import io
Then you load dlib's default face detector, the image of Obama, and then invoke the detector on the image:
detector = dlib.get_frontal_face_detector()
img = io.imread('obama.jpg')
faces = detector(img)
The result is an array of boxes called faces. Each box gives the pixel coordinates that bound each detected face. To get these coordinates out of faces you do something like:
for d in faces:
    print "left,top,right,bottom:", d.left(), d.top(), d.right(), d.bottom()
We can also view the results graphically by running:
win = dlib.image_window()
win.set_image(img)
win.add_overlay(faces)

But what if you wanted to create your own object detector?  That's easy too.  Dlib comes with an example program and a sample training dataset showing how to this.  But to summarize, you do:
options = dlib.simple_object_detector_training_options()
options.C = 5  # Set the SVM C parameter to 5.  
dlib.train_simple_object_detector("training.xml","detector.svm", options)
That will run the trainer and save the learned detector to a file called detector.svm. The training data is read from training.xml which contains a list of images and bounding boxes. The example that comes with dlib shows the format of the XML file. There is also a graphical tool included that lets you mark up images with a mouse and save these XML files. Finally, to load your custom detector you do:
detector = dlib.simple_object_detector("detector.svm")
If you want to try it out yourself you can download the new dlib release here.

Thursday, April 3, 2014

MITIE: A completely free and state-of-the-art information extraction tool

I work at a MIT lab and there are a lot of cool things about my job. In fact, I could go on all day about it, but in this post I want to talk about one thing in particular, which is that we recently got funded by the DARPA XDATA program to make an open source natural language processing library focused on information extraction.

Why make such a thing when there are already open source libraries out there for this (e.g. OpenNLP, NLTK, Stanford IE, etc.)? Well, if you look around you quickly find out that everything which exists is either expensive, not state-of-the-art, or GPL licensed. If you wanted to use this kind of NLP tool in a non-GPL project then you are either out of luck, have to pay a lot of money, or settle for something of low quality. Well, not anymore! We just released the first version of our MIT Information Extraction library which is built using state-of-the-art statistical machine learning tools.

At this point it has just a C API and an example program showing how to do English named entity recognition. Over the next few weeks we will be adding bindings for other languages like Pyhton and Java. We will also be adding a lot more NLP tools in addition to named entity recognition, starting with relation extractors and part of speech taggers. But in the meantime you can use the C API or the streaming command line program.  For example, if you had the following text in a file called sample_text.txt:
Meredith Vieira will become the first woman to host Olympics primetime coverage on her own when she fills on Friday night for the ailing Bob Costas, who is battling a continuing eye infection.  
 Then you can simply run:
cat sample_text.txt | ./ner_stream MITIE-models/ner_model.dat
And you get this as output:
 [PERSON Meredith Vieira] will become the first woman to host [MISC Olympics] primetime coverage on her own when she fills on Friday night for the ailing [PERSON Bob Costas] , who is battling a continuing eye infection .
It's all up on github so if you want to try it out yourself then just run these commands and off you go:
git clone https://github.com/mit-nlp/MITIE.git
cd MITIE
./fetch_submodules.sh
make examples
make MITIE-models
cat sample_text.txt | ./ner_stream MITIE-models/ner_model.dat

Monday, February 3, 2014

Dlib 18.6 released: Make your own object detector!

I just posted the next version of dlib, v18.6.  There are a bunch of nice changes, but the most exciting addition is a tool for creating histogram-of-oriented-gradient (HOG) based object detectors.  This is a technique for detecting semi-rigid objects in images which has become a classic computer vision method since its publication in 2005.  In fact, the original HOG paper has been cited over 7000 times, which for those of you who don't follow the academic literature, is a whole lot.

But back to dlib, the new release has a tool that makes training HOG detectors super fast and easy.  For instance, here is an example program that shows how to train a human face detector.  All it needs as input is a set of images and bounding boxes around faces.  On my computer it takes about 6 seconds to do its training using the example face data provided with dlib.  Once finished it produces a HOG detector capable of detecting faces.  An example of the detector's output on a new image (i.e. one it wasn't trained on) is shown below:


You should compare this to the time it takes to train OpenCV's popular cascaded haar object detector, which is generally reported to take hours or days to train and requires you to fiddle with false negative rates and all kinds of spurious parameters.  HOG training is considerably simpler.

Moreover, the HOG trainer uses dlib's structural SVM solver which enables it to train on all the sub-windows in every image.  This means you don't have to perform any tedious subsampling or "hard negative mining".  It also means you often don't need that much training data.  In particular, the example program that trains a face detector takes in only 4 images, containing a total of 18 faces.  That is sufficient to produce the HOG detector used above.  The example also shows you how to visualize the learned HOG detector, which in this case looks like:


It looks like a face!  It should be noted that it's worth training on more than 4 images since it doesn't take that long to label and train on at least a few hundred objects and it can improve the accuracy.  In particular, I trained a HOG face detector using about 3000 images from the labeled faces in the wild dataset and the training took only about 3 minutes.  3000 is probably excessive, but who cares when training is so fast.

The face detector which was trained on the labeled faces in the wild data comes with the new version of dlib. You can see how to use it in this face detection example program.  The underlying detection code in dlib will make use of SSE instructions on Intel CPUs and this makes dlib's HOG detectors run at the same speed as OpenCV's fast cascaded object detectors.  So for something like a 640x480 resolution web camera it's fast enough to run in real-time.  As for the accuracy, it's easy to get the same detection rate as OpenCV but with thousands of times fewer false alarms.  You can see an example in this youtube video which compares OpenCV's face detector to the new HOG face detector in dlib.  The circles are from OpenCV's default face detector and the red squares are dlib's HOG based face detector.   The difference is night and day. 


Finally, here is another fun example.  Before making this post I downloaded 8 images of stop signs from Google images, drew bounding boxes on them and then trained a HOG detector.  This is the detector I got after a few seconds of training:


It looks like a stop sign and testing it on a new image works great.


All together it took me about 5 minutes to go from not having any data at all to a working stop sign detector.  Not too shabby.  Go try it out yourself.  You can get the the new dlib release here :)

Friday, March 9, 2007

Adding a web interface to a C++ application

One thing that is always sort of a pain is setting up a graphical user interface. This is especially true if you are making an embedded application or something that functions more as a system service or daemon. In this case you probably end up creating some simple network protocol which you use to control your application via some other remote piece of software or just telnet if you are feeling especially lazy.

What would be nice, however, is to have a web based interface but not have to jump through a bunch of hoops to add it into your code. This sort of simple add-in HTTP server is exactly what this post is all about.

A few months ago I was spending Christmas with the family. Good times all around but they live out in the middle of nowhere. And when I mean nowhere I'm talking no TV, 1 bar on the cell phone during a 'clear' conversation, no internet, and some farm animals. We do have what is seemingly an ex-circus goat that walks around by balancing on just its front two feet. You can imagine how entertaining that is but even so it only occupies you for so long. So naturally I broke out my laptop and created just the thing I have always wanted, a simple HTTP server I can stick into my C++ applications. I love Christmas :)

The code for the thing is available from sourceforge. There is also a more involved example program that can be viewed in its entirety here, but for the sake of creating a somewhat tidy little tutorial I'll show a simple example.
#include <dlib/server.h>
using namespace dlib;

class web_server : public server_http
{    
    const std::string on_request (
        const incoming_things& incoming,
        outgoing_things& outgoing    
    )
    {
        return "<html><body>Hurray for simple things!</body></html>";
    }
};

int main()
{
    web_server our_web_server;
    our_web_server.set_listening_port(80);
    our_web_server.start();
}
Basically what is happening here, if it isn't already obvious, is we are defining a class which acts as our HTTP server. To do this all you need to do is inherit from server_http and implement the virtual function on_request(). To turn it on just set the listening port number and call start(). That's it. If you compiled this and ran it you could check out the page it creates by directing your browser to http://localhost/ and it would pop up. You would see a page with the single line of text "Hurray for simple things!".

For an explanation of the arguments to the on_request() function check out the example program linked to above and/or check out the documentation on the dlib web site.