Thursday, August 28, 2014

Real-Time Face Pose Estimation

I just posted the next version of dlib, v18.10, and it includes a number of new minor features.  The main addition in this release is an implementation of an excellent paper from this year's Computer Vision and Pattern Recognition Conference:
One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine Sullivan
As the name suggests, it allows you to perform face pose estimation very quickly. In particular, this means that if you give it an image of someone's face it will add this kind of annotation:

In fact, this is the output of dlib's new face landmarking example program on one of the images from the HELEN dataset.  To get an even better idea of how well this pose estimator works take a look at this video where it has been applied to each frame:


It doesn't just stop there though.  You can use this technique to make your own custom pose estimation models.  To see how, take a look at the example program for training these pose estimation models.

38 comments :

Hamilton 漢密頓 said...

well done

Hamilton 漢密頓 said...

well done

Rodrigo Benenson said...

Have you evaluated this implementation quality and/or speed-wise ? How does it compare to the numbers reported in the original research paper ?

Davis King said...

Yes. The results are comparable to those reported in the paper both in terms of speed and accuracy.

Rodrigo Benenson said...

Sweet !

Stephen Moore said...

Does the "real time pose estimation algorithm" use a face detector every frame or use the previous frames output for current frame estimation?

Davis King said...

You can run it either way. The input to the pose estimator is a bounding box for a face and it outputs the pose.

The included example program shows how to get that bounding box from dlib's face detector but you could just as easily use the face pose from the previous frame to define the bounding box.

Amanda Sgroi said...

In the paper, "One Millisecond Face Alignment ..." they output 194 landmark points on the face, however the implementation provided in dlib only outputs 68 points. Is there a way to easily produce the 194 points using the code provided in dlib?

Davis King said...

I only included the 68 point style model used by the iBUG 300-W dataset in this dlib release. However, if you want to train a 194 point model you can do so pretty easily by following the example here: http://dlib.net/train_shape_predictor_ex.cpp.html

You can get the training data from the HELEN dataset webpage http://www.ifp.illinois.edu/~vuongle2/helen/.

drjo said...

I compiled the example from v18.10 and get an error, DLIB_JPEG_SUPPORT not #defined Unable to load the image in file ..\faces\2007_007763.jpg.

Can you please help me out?

Davis King said...

You need to tell your compiler to add a #define for DLIB_JPEG_SUPPORT and then link it with libjpeg.

If you are unsure how to configure your compiler to do this then I would suggest using CMake (following the directions http://dlib.net/compile.html). CMake will set all this stuff up for you.

Xan63 said...

Hi thanks for dlib !
I also have an issue with jpeg (win7, visual and CMake) when compiling dlib :
error C2371: 'INT32' : redefinition; different basic types, in jmorecfg.h

it compiles (and works) just fine without jpeg support

Xan63 said...

answering myself, If I leave JPEG_LIBRAY and JPEG_INCLUDE_DIR empty in my Cmake-gui, then dlib is still compiled with JPEG support, despite CMake telling me: Could NOT find JPEG (missing: JPEG_LIBRARY JPEG_INCLUDE_DIR)
Not sure what is going on, but it works...

Davis King said...

CMake will try to find a version of libjpeg that is installed on your system and use that. If it can't find a system version of libjpeg it prints out that it didn't find it. I then have CMake setup to statically compile the copy in the dlib/external/libjpeg folder when a system install of libjpeg is not found. So that's why you get that message.

More importantly, I want to make sure dlib always compiles cleanly with cmake. So can you post the exact commands you typed to get the error C2371: 'INT32' : redefinition; different basic types, in jmorecfg.h error?

I don't get this on any of the systems I have. The string INT32 doesn't even appear in any code in the dlib folder so I'm not sure how this happened.

Xan63 said...

That explains a lot...
As for the commands, I use Cmake-gui, so I just throw the CmakeLists.txt in there and everything works fine, except that error message about JPEG (Could NOT find JPEG (missing: JPEG_LIBRARY JPEG_INCLUDE_DIR)

If I try to fix it (now i understand that I don't need to) and fill the JPEG_INCLUDE_DIR and JPEG_LIBRARY in Cmake-gui, for example using libjpeg that comes with opencv, then I get this C2371: 'INT32' error when compiling (with visual 2012)

Davis King said...

Ok, that makes sense. I'll add a print statement to the CMakeLists.txt file so it's clearer what is happening in this case :)

Ked Su said...
This comment has been removed by the author.
Davis King said...

That google drive link doesn't work for me. Can you post the image another way? Also, is the image extremely large? That's the only way I would expect an out of memory error.

Ked Su said...
This comment has been removed by the author.
Davis King said...

Huh, I don't know what's wrong. That's not a large enough image to cause an out of memory error. I also tried it on my computer and it works fine.

What system and compiler are you using? Also, what is the exact error message you get when you run the image though the face_landmark_detection_ex example program that comes with dlib?

Ked Su said...
This comment has been removed by the author.
Davis King said...

Cool. No worries :)

Cheers,
Davis

mohanraj said...

I am facing problem, while trying to run face detection program in visual studio 2012.
dlib_jpeg_support not define
how to fix this problem

Davis King said...

Try compiling it with CMake. The instructions are shown here: http://dlib.net/compile.html

mohanraj said...

i compiled the example folder file in dlib with the cmake, how to test the program now

Davis King said...

Then you run the face_landmark_detection_ex executable.

Shengyin Wu said...

Can you tell me the paramters you trained on the ibug dataset ?

Davis King said...

If I recall correctly, when training on iBUG I used the default dlib parameter settings except I set the cascade depth to 15 instead of 10.

Jess said...

I am wondering if you can help me with a speed issue I am having.

I am trying to set up a test using my laptops webcam (opencv) to add the face pose overlay in real time using the example code provided.

The face detector and full_object_detection functions seem to be taking multiple seconds per frame to compute (480x640).

I have compiled dlib using cmake on visual studio 2013 with the 64 bit and avx flags.

I was wondering if you could point me in the right direction to reach this one millisecond number the paper boasts.

Davis King said...

Did you compile in release or debug mode?

Jess said...

Ah, yes that was the problem. I had assumed setting cmake to release would default the library build to release as I only changed the example code build settings in VS.

Thanks!

Shengyin Wu said...

when tranining ibug dataset, did you generate the bounding box yourself, or just use the bounding box the 300 face in wild conpetition supplied?

Davis King said...

I generated the bounding boxes using dlib's included face detector. This way, the resulting model is calibrated to work well with dlib's face detector.

Shengyin Wu said...

if the detector faild to detect the face, how did you generate the bounding box? thanks for you reply.

Davis King said...

In that case I generated it based on the landmark positions. However, I made sure the box was sized and positioned in the same way the dlib detector would have output if it had detected it (e.g. centered on the nose and at a certain scale relative to the whole face).

Emre YAZICI said...

Hello, great work and works very fast.

Thanks.

Is there any method to estimate Yaw, Pitch, Roll with these estimated landmarks?

Emre YAZICI said...
This comment has been removed by the author.
Davis King said...

Thanks.

The output is just the landmarks.