I just released the next version of dlib. This time I added tools for computing 2D FFTs, Hough transforms, image skeletonizations, and also a simple and type safe API for calling C++ code from MATLAB. Readers familiar with writing MATLAB mex functions know how much of a pain it is, but no longer! Here is an example of a C++ function callable from MATLAB using dlib's new MATLAB binding API. You can also compile it with CMake so building it is super easy. There is an example CMake file in the dlib/matlab folder showing how to set it up. I also used this tool to give the MITIE project a simple MATLAB API. So you can see another example of how easy it is to set this up in the MITIE MATLAB example.
There are also some fun new things in the pipe for the next dlib release (v18.13). First, Patrick Snape, one of the main developers of the menpo project, is adding a Python interface to dlib's shape prediction tools. You can follow that over on dlib's github repo. I'm also working on a single object tracker for OpenCV's Vision Challenge which I plan to include in the next version of dlib.
Saturday, December 20, 2014
Saturday, November 15, 2014
Dlib 18.11 released
The new version of dlib is out. This release contains mostly minor bug fixes and usability improvements, with the notable exception of new routines for extracting local-binary-pattern features from images and improved tools for learning distance metrics. See the release notes for further information.
I also recently found out about two particularly interesting projects that use dlib. The first is menpo, a Python library focused on computer vision which is being developed by a team at Imperial College London. If you are interested in a Python library that pulls together a bunch of computer vision tools then definitely check it out. The other interesting project is Ceemple, which is basically an interactive language shell for C++. They have integrated a bunch of libraries like dlib and OpenCV into it with the general goal of making C++ development feel more rapid and interactive. So think of something like MATLAB or IPython, but for C++.
I also recently found out about two particularly interesting projects that use dlib. The first is menpo, a Python library focused on computer vision which is being developed by a team at Imperial College London. If you are interested in a Python library that pulls together a bunch of computer vision tools then definitely check it out. The other interesting project is Ceemple, which is basically an interactive language shell for C++. They have integrated a bunch of libraries like dlib and OpenCV into it with the general goal of making C++ development feel more rapid and interactive. So think of something like MATLAB or IPython, but for C++.
Tuesday, October 21, 2014
MITIE v0.3 Released: Now with Java and R APIs
We just made the next release of MITIE, a new DARPA funded information extraction tool being created by our team at MIT. This release is relatively minor and just adds APIs for Java and R. The project page on github explains how to get started using either of these APIs.
I want to take some time and explain how the Java API is implemented since, as I discovered while making MITIE's Java API, there aren't clear instructions for doing this anywhere on the internet. So hopefully this little tutorial will help you if you decide to make a similar Java binding to a C++ library. So to begin, let's think about the requirements for a good Java binding:
I want to take some time and explain how the Java API is implemented since, as I discovered while making MITIE's Java API, there aren't clear instructions for doing this anywhere on the internet. So hopefully this little tutorial will help you if you decide to make a similar Java binding to a C++ library. So to begin, let's think about the requirements for a good Java binding:
- You should be able to compile it from source with a simple command
- A user of your library should not need to edit or configure anything to compile the API
- The compilation process should work on any platform
- Writing JNI is awful so you shouldn't have to do that
mkdir build cd build cmake .. cmake --build . --config Release --target install
That creates a jar file and shared library file which together form the MITIE Java API. Let's run through a little example to see how you can define new Java APIs. Imagine you have created a simple C++ API that looks like this:
void printSomeString (const std::string& message); class MyClass { public: std::vector<std::string> getSomeStrings() const; };
and you want to be able to use it from Java. You just need to put this C++ API in a header file called swig_api.h and include some Swig commands that tell it what to call std::vector<std::string> in the generated Java API. So the contents of swig_api.h would look like:
// Define some swig type maps that tell swig what to call various instantiations of // std::vector. #ifdef SWIG %include "std_string.i" %include "std_vector.i" %template(StringVector) std::vector<std::string>; #endif #include <string> #include <vector> void printSomeString (const std::string& message); class MyClass { public: std::vector<std::string> getSomeStrings() const; };The next step is to create a CMakeLists.txt file that tells CMake how to compile your API. In our case, it would look like:
cmake_minimum_required (VERSION 2.8.4) project(example) set(java_package_name edu.mit.ll.example) # List the source files you want to compile into the Java API. These contain # things like implementations of printSomeString() and whatever else you need. set(source_files my_source.cpp another_source_file.cpp ) # List the folders that contain your header files include_directories( . ) # List of libraries to link to. For example, you might need to link to pthread set(additional_link_libraries pthread) # Tell CMake to put the compiled shared library and example.jar file into the # same folder as this CMakeLists.txt file when the --target install option is # executed. You can put any folder here, just give a path that is relative to # the CMakeLists.txt file. set(install_target_output_folder .) include(cmake_swig_jni)
That's it. Now you can compile your Java API using CMake and you will get an example.jar and example.dll or libexample.so file depending on your platform. Then to use it you can write java code like this:
import edu.mit.ll.example.*; public class Example { public static void main(String args[]) { global.printSomeString("hello world!"); MyClass obj = new MyClass(); StringVector temp = obj.getSomeStrings(); for (int i = 0; i < temp.size(); ++i) System.out.println(temp.get(i)); } }
and execute it via:
javac -classpath example.jar Example.java
java -classpath example.jar;. -Djava.library.path=. Example
assuming the examle.jar and shared library are in your current folder. Note that Linux or OS X users will need to use a : as the classpath separator rather than ; as is required on Windows. But that's it! You just made a Java interface to your C++ library. You might have noticed the include(cmake_swig_jni) statement though. That is a bunch of CMake magic I had to write to make all this work, but work it does and on different platforms without trouble. You can see a larger example of a Java to C++ binding in MITIE's github repo using this same setup.
Thursday, August 28, 2014
Real-Time Face Pose Estimation
I just posted the next version of dlib, v18.10, and it includes a number of new minor features. The main addition in this release is an implementation of an excellent paper from this year's Computer Vision and Pattern Recognition Conference:
In fact, this is the output of dlib's new face landmarking example program on one of the images from the HELEN dataset. To get an even better idea of how well this pose estimator works take a look at this video where it has been applied to each frame:
It doesn't just stop there though. You can use this technique to make your own custom pose estimation models. To see how, take a look at the example program for training these pose estimation models.
One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine SullivanAs the name suggests, it allows you to perform face pose estimation very quickly. In particular, this means that if you give it an image of someone's face it will add this kind of annotation:
In fact, this is the output of dlib's new face landmarking example program on one of the images from the HELEN dataset. To get an even better idea of how well this pose estimator works take a look at this video where it has been applied to each frame:
It doesn't just stop there though. You can use this technique to make your own custom pose estimation models. To see how, take a look at the example program for training these pose estimation models.
Thursday, July 10, 2014
MITIE v0.2 Released: Now includes Python and C++ APIs for named entity recognition and binary relation extraction
A few months ago I posted about MITIE, the new DARPA funded information extraction tool being created by our team at MIT. At the time it only provided English named entity recognition and sported a simple C API. Since then we have been busy adding new features and today we released a new version of MITIE which adds a bunch of nice things, including:
- Python and C++ APIs
- Many example programs
- 21 English binary relation extractors which identify pairs of entities with certain relations. E.g. "PERSON BORN_IN PLACE"
- Python, C, and C++ APIs for training your own named entity and binary relation extractors
You can get MITIE from its github page. Then you can try out some of the new features in v0.2, one of which is binary relation extraction. This means you can ask MITIE if two entities participate in some known relationship, for example, you can ask if a piece of text is making the claim that a person was born in a location. I.e. Are the person and location entities participating in the "born in" relationship?
In particular, you could run MITIE over all the Wikipedia articles that mention Barack Obama and find each instance where someone made the claim that Barack Obama was born in some place. I did this with MITIE and found the following:
In particular, you could run MITIE over all the Wikipedia articles that mention Barack Obama and find each instance where someone made the claim that Barack Obama was born in some place. I did this with MITIE and found the following:
- 14 claims that Barack Obama was born in Hawaii
- 5 claims that Barack Obama was born in the United States
- 3 claims that Barack Obama was born in Kenya
Which is humorous. One of them is the sentence:
You can still find sources of that type which still assert that "Barack Obama was born in Kenya"
When you read it in the broader context of the article it's clear that it's not claiming he was born in Kenya. So this is a good example of why it's important to aggregate over many relation instances when using a relation extractor. By aggregating many examples we can get reasonably accurate outputs in the face of these kinds of mistakes.
However, what is even more entertaining than poking fun at American political dysfunction is MITIE's new API for creating your own entity and relation extractors. We worked to make this very easy to use, and in particular, there are no parameters you need to mess with, everything is dealt with internal to MITIE. All you, the user, need to do is give example data showing what you want MITIE to learn to detect and it takes care of the rest. Moreover, in the spirit of easy to use APIs, we also added a new Python API that allows you to exercise all the functionality in MITIE via Python. As a little example, here is how you use it to find named entities:
from mitie import * ner = named_entity_extractor('MITIE-models/english/ner_model.dat') tokens = tokenize("The MIT Information Extraction (MITIE) tool was created \ by Davis King, Michael Yee, and Wade Shen at the \ Massachusetts Institute of Technology.") print tokens
This loads in the English named entity recognizer model that comes with MITIE and then tokenizes the sentence. So the print statement produces
['The', 'MIT', 'Information', 'Extraction', '(', 'MITIE', ')', 'tool', 'was', 'created', 'by', 'Davis', 'King', ',', 'Michael', 'Yee', ',', 'and', 'Wade', 'Shen', 'at', 'the', 'Massachusetts', 'Institute', 'of', 'Technology', '.']
Then to find the named entities we simply do
entities = ner.extract_entities(tokens) print "Number of entities detected:", len(entities) print "Entities found:", entities
Which prints:
Number of entities detected: 6
Entities found: [(xrange(1, 4), 'ORGANIZATION'), (xrange(5, 6), 'ORGANIZATION'), (xrange(11, 13), 'PERSON'), (xrange(14, 16), 'PERSON'), (xrange(18, 20), 'PERSON'), (xrange(22, 26), 'ORGANIZATION')]
So the output is just a list of ranges and labels. Each range indicates which tokens are part of that entity. To print these out in a nice list we would just do
for e in entities: range = e[0] tag = e[1] entity_text = " ".join(tokens[i] for i in range) print tag + ": " + entity_text
Which prints:
ORGANIZATION: MIT Information Extraction
ORGANIZATION: MITIE
PERSON: Davis King
PERSON: Michael Yee
PERSON: Wade Shen
ORGANIZATION: Massachusetts Institute of Technology
Wednesday, April 9, 2014
Dlib 18.7 released: Make your own object detector in Python!
A while ago I boasted about how dlib's object detection tools are better than OpenCV's. However, one thing OpenCV had on dlib was a nice Python API, but no longer! The new version of dlib is out and it includes a Python API for using and creating object detectors. What does this API look like? Well, lets start by imagining you want to detect faces in this image:
You would begin by importing dlib and scikit-image:
But what if you wanted to create your own object detector? That's easy too. Dlib comes with an example program and a sample training dataset showing how to this. But to summarize, you do:
You would begin by importing dlib and scikit-image:
import dlib from skimage import ioThen you load dlib's default face detector, the image of Obama, and then invoke the detector on the image:
detector = dlib.get_frontal_face_detector() img = io.imread('obama.jpg') faces = detector(img)The result is an array of boxes called faces. Each box gives the pixel coordinates that bound each detected face. To get these coordinates out of faces you do something like:
for d in faces: print "left,top,right,bottom:", d.left(), d.top(), d.right(), d.bottom()We can also view the results graphically by running:
win = dlib.image_window() win.set_image(img) win.add_overlay(faces)
But what if you wanted to create your own object detector? That's easy too. Dlib comes with an example program and a sample training dataset showing how to this. But to summarize, you do:
options = dlib.simple_object_detector_training_options() options.C = 5 # Set the SVM C parameter to 5. dlib.train_simple_object_detector("training.xml","detector.svm", options)That will run the trainer and save the learned detector to a file called detector.svm. The training data is read from training.xml which contains a list of images and bounding boxes. The example that comes with dlib shows the format of the XML file. There is also a graphical tool included that lets you mark up images with a mouse and save these XML files. Finally, to load your custom detector you do:
detector = dlib.simple_object_detector("detector.svm")If you want to try it out yourself you can download the new dlib release here.
Thursday, April 3, 2014
MITIE: A completely free and state-of-the-art information extraction tool
I work at a MIT lab and there are a lot of cool things about my job. In fact, I could go on all day about it, but in this post I want to talk about one thing in particular, which is that we recently got funded by the DARPA XDATA program to make an open source natural language processing library focused on information extraction.
Why make such a thing when there are already open source libraries out there for this (e.g. OpenNLP, NLTK, Stanford IE, etc.)? Well, if you look around you quickly find out that everything which exists is either expensive, not state-of-the-art, or GPL licensed. If you wanted to use this kind of NLP tool in a non-GPL project then you are either out of luck, have to pay a lot of money, or settle for something of low quality. Well, not anymore! We just released the first version of our MIT Information Extraction library which is built using state-of-the-art statistical machine learning tools.
At this point it has just a C API and an example program showing how to do English named entity recognition. Over the next few weeks we will be adding bindings for other languages like Pyhton and Java. We will also be adding a lot more NLP tools in addition to named entity recognition, starting with relation extractors and part of speech taggers. But in the meantime you can use the C API or the streaming command line program. For example, if you had the following text in a file called sample_text.txt:
Why make such a thing when there are already open source libraries out there for this (e.g. OpenNLP, NLTK, Stanford IE, etc.)? Well, if you look around you quickly find out that everything which exists is either expensive, not state-of-the-art, or GPL licensed. If you wanted to use this kind of NLP tool in a non-GPL project then you are either out of luck, have to pay a lot of money, or settle for something of low quality. Well, not anymore! We just released the first version of our MIT Information Extraction library which is built using state-of-the-art statistical machine learning tools.
At this point it has just a C API and an example program showing how to do English named entity recognition. Over the next few weeks we will be adding bindings for other languages like Pyhton and Java. We will also be adding a lot more NLP tools in addition to named entity recognition, starting with relation extractors and part of speech taggers. But in the meantime you can use the C API or the streaming command line program. For example, if you had the following text in a file called sample_text.txt:
Meredith Vieira will become the first woman to host Olympics primetime coverage on her own when she fills on Friday night for the ailing Bob Costas, who is battling a continuing eye infection.Then you can simply run:
cat sample_text.txt | ./ner_stream MITIE-models/ner_model.datAnd you get this as output:
[PERSON Meredith Vieira] will become the first woman to host [MISC Olympics] primetime coverage on her own when she fills on Friday night for the ailing [PERSON Bob Costas] , who is battling a continuing eye infection .
It's all up on github so if you want to try it out yourself then just run these commands and off you go:
git clone https://github.com/mit-nlp/MITIE.git cd MITIE ./fetch_submodules.sh make examples make MITIE-models cat sample_text.txt | ./ner_stream MITIE-models/ner_model.dat
Monday, February 3, 2014
Dlib 18.6 released: Make your own object detector!
I just posted the next version of dlib, v18.6. There are a bunch of nice changes, but the most exciting addition is a tool for creating histogram-of-oriented-gradient (HOG) based object detectors. This is a technique for detecting semi-rigid objects in images which has become a classic computer vision method since its publication in 2005. In fact, the original HOG paper has been cited over 7000 times, which for those of you who don't follow the academic literature, is a whole lot.
But back to dlib, the new release has a tool that makes training HOG detectors super fast and easy. For instance, here is an example program that shows how to train a human face detector. All it needs as input is a set of images and bounding boxes around faces. On my computer it takes about 6 seconds to do its training using the example face data provided with dlib. Once finished it produces a HOG detector capable of detecting faces. An example of the detector's output on a new image (i.e. one it wasn't trained on) is shown below:
You should compare this to the time it takes to train OpenCV's popular cascaded haar object detector, which is generally reported to take hours or days to train and requires you to fiddle with false negative rates and all kinds of spurious parameters. HOG training is considerably simpler.
Moreover, the HOG trainer uses dlib's structural SVM based training algorithm which enables it to train on all the sub-windows in every image. This means you don't have to perform any tedious subsampling or "hard negative mining". It also means you often don't need that much training data. In particular, the example program that trains a face detector takes in only 4 images, containing a total of 18 faces. That is sufficient to produce the HOG detector used above. The example also shows you how to visualize the learned HOG detector, which in this case looks like:
It looks like a face! It should be noted that it's worth training on more than 4 images since it doesn't take that long to label and train on at least a few hundred objects and it can improve the accuracy. In particular, I trained a HOG face detector using about 3000 images from the labeled faces in the wild dataset and the training took only about 3 minutes. 3000 is probably excessive, but who cares when training is so fast.
The face detector which was trained on the labeled faces in the wild data comes with the new version of dlib. You can see how to use it in this face detection example program. The underlying detection code in dlib will make use of SSE instructions on Intel CPUs and this makes dlib's HOG detectors run at the same speed as OpenCV's fast cascaded object detectors. So for something like a 640x480 resolution web camera it's fast enough to run in real-time. As for the accuracy, it's easy to get the same detection rate as OpenCV but with thousands of times fewer false alarms. You can see an example in this youtube video which compares OpenCV's face detector to the new HOG face detector in dlib. The circles are from OpenCV's default face detector and the red squares are dlib's HOG based face detector. The difference is night and day.
Subscribe to:
Posts
(
Atom
)