Sunday, January 14, 2018

Correctly Mirroring Datasets

I get asked a lot of questions about dlib's landmarking tools. Some of the most common questions are about how to prepare a good training dataset. One of the most useful tricks for creating a dataset is to mirror the data, since this effectively doubles the amount of training data. However, if you do this naively you end up with a terrible training dataset that produces really awful landmarking models. Some of the most common questions I get are about why this is happening.

To understand the issue, consider the following image of an annotated face from the iBug W-300 dataset:


Since the mirror image of a face is still a face, we can mirror images like this to get more training data. However, what happens if you simply mirror the annotations? You end up with the wrong annotation labels! To see this, take a look at the figure below. The left image shows what happens if you naively mirror the above image and its landmarks. Note, for instance, that the points along the jawline are now annotated in reverse order. In fact, nearly all the annotations in the left image are wrong. Instead, you want to match the source image's labeling scheme. A mirrored image with the correct annotations is shown on the right.


Dlib's imglab tool has had a --flip option for a long time that would mirror a dataset for you. However, it used naive mirroring and it was left up to the user to adjust any landmark labels appropriately. Many users found this confusing, so in the new version of imglab (v1.13) the --flip command now performs automatic source label matching using a 2D point registration algorithm. That is, it left-right flips the dataset and annotations. Then it registers the mirrored landmarks with the original landmarks and transfers labels appropriately. In fact, the "source label matching" image on the right was created by the new version of imglab.

Finally, just to be clear, the point registration algorithm will work on anything. It doesn't have to be iBug's annotations. It doesn't have to be faces. It's a general point registration method that will work correctly for any kind of landmark annotated data with left-right symmetry. However, if you want the old --flip behavior you can use the new --flip-basic to get a naive mirroring. But most users will want to use the new --flip.

20 comments :

  1. Thanks Davis for this clarifications
    Emad Omar

    ReplyDelete
  2. Which loss layer would you recommend for training a DNN based landmark detector? My labels are 68 point landmarks with X,Y coordinates. Is there a dlib tutorial covering that?

    ReplyDelete
  3. Use a mean squared loss I suppose, but there isn't any tutorial for it. I would find a CVPR paper that discusses this and follow that.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Ok so I am going to use loss_mean_squared_multioutput<fc<68,my_network>>; Is that correct? Also, is there a provision for visualizing the graph of the full network - just to make sure it is correct.

    ReplyDelete
  6. Thanks Davis. We are working on adding three more points to the 5-point landmark data included in dlib: left mouth, right mouth and chin.

    We've already annotated the data, but the training is exceptionally slow when compared to the 5-point example.

    The function randomly_generate_split_feature takes anything between one to 5 seconds per test split, practically grinding the training to a halt.

    When we don't flip or rotate (either does this) then it is on-par with the 5-point example.

    We did flip the new labels (left mouth and right mouth) accordingly.

    Do you have any tips on what we can attempt?

    ReplyDelete
  7. Flipping the data should have nothing to do with speed. You probably just have a bug in your code, are running out of RAM and swapping to disk, or forgot to turn on compiler optimizations.

    ReplyDelete
  8. Thanks for the quick response. I am running the exact same code that you provided for the 5 point example, I only modified the assert on 5 points and added an additional swap for the two mouth points.

    Just commenting out the flipping gives an estimate of 5 hours for the training. Keeping them in doesn't even get to the estimate in a couple of hours.

    ReplyDelete
  9. I have no idea what you are talking about. What code?

    ReplyDelete
  10. If you think this is something wrong with dlib then submit an issue report on github. Include details.

    ReplyDelete
  11. I'm referring to this

    I'm not sure if there is an actual issue or if I'm maybe missing some constraint. I'll make an issue on github just for easier collaboration and reproduction.

    ReplyDelete
  12. Hi, I work with dlib and I have a question. I want to extract the mouth corners x y coordinates. How I can do it???? Help me, please

    ReplyDelete
  13. Use the face landmarks that fall on the mouth corners? Look at the images in this post. The landmarks tell you where the mouth corners are.

    ReplyDelete
  14. Hello Davis,

    At the beginning many thanks for your hard work - dlib is simply awesome.

    I want to ask you how to adjust face landmark detector (yes yes this one over and over again..) to reflect 194 Helen database when it only shows 68 face landmarks..

    I tried to follow up your post from September 8, 2014 and make fun with train_shape_predictor_ex.cpp but It's still reflect only 68 points
    I would be very grateful for any hint how to overcome this problem..
    Thanks in advance :D

    ReplyDelete
  15. You don't have to do anything to use a different annotation format. Just make a xml file with the annotations and run the trainer on it. The only thing that hard codes 68 is the GUI display code for drawing the lines between points. Don't call that code.

    ReplyDelete
  16. Hi Davis, Thanks for your fast reply.

    Phew. at least data are right :)
    I have found render_face_detections.h which has fixed values as As I understand this one is responsible for rendering the points ?

    I know that is a silly question but I need to extract the found data and also show that they are found and I'm not that good with c++ at this moment ( mostly java on junior level)



    ReplyDelete
  17. Hi Davis,

    I have used --flip option on imglab recently.
    But I found that the image file path in .xml is absolute path instead of relative path.
    Do you have any suggestions to fix this problem?
    Thank you!

    ReplyDelete
  18. Run imglab from the same folder as the xml file and it will preserve relative paths.

    ReplyDelete
  19. Hello Davis,

    At the beginning above all, thank you always for your development and great support on your dlib.

    I am trying to use my face data set with landmark points in the face_landmark_detection_ex.
    Although I have generated an .xml file of the bounding boxes and landmark positions of faces,
    I am not sure how to generate a .dat file from that file.
    I would like to call like:
    face_landmark_detection_ex 'filename.dat' 'imagename'

    Is there any way to create .dat file from .xml file with command prompt ?

    Thank you in advance.

    ReplyDelete
  20. You can pass a lambda or any callable object. Put whatever state you want in that lambda or callable object.

    ReplyDelete