Saturday, September 23, 2017

Fast Multiclass Object Detection in Dlib 19.7

The new version of dlib is out and the biggest new feature is the ability to train multiclass object detectors with dlib's convolutional neural network tooling.  The previous version only allowed you to train single class detectors, but this release adds the option to create single CNN models that output multiple labels.  As an example, I created a small 894 image dataset where I annotated the fronts and rears of cars and used it to train a 2-class detector.  You can see the resulting detector running in this video:

If you want to run the car detector from this video on your own images you can check out this example program.

I've also improved the detector speed in dlib 19.7 by pushing more of the processing to the GPU. This makes the detector 2.5x faster.  For example, running the detector on the 928x478 image used in this example program ran at 39fps in the previous version of dlib, but now runs at 98fps (when run on a NVIDIA 1080ti).

This release also includes a new 5-point face landmarking model that finds the corners of the eyes and bottom of nose:

Unlike the 68-point landmarking model included with dlib, this model is over 10x smaller at 8.8MB compared to the 68-point model's 96MB.  It also runs faster, and even more importantly, works with the state-of-the-art CNN face detector in dlib as well as the older HOG face detector in dlib.  The central use-case of the 5-point model is to perform 2D face alignment for applications like face recognition.  In any of the dlib code that does face alignment, the new 5-point model is a drop-in replacement for the 68-point model and in fact is the new recommended model to use with dlib's face recognition tooling.