Sunday, January 14, 2018

Correctly Mirroring Datasets

I get asked a lot of questions about dlib's landmarking tools. Some of the most common questions are about how to prepare a good training dataset. One of the most useful tricks for creating a dataset is to mirror the data, since this effectively doubles the amount of training data. However, if you do this naively you end up with a terrible training dataset that produces really awful landmarking models. Some of the most common questions I get are about why this is happening.

To understand the issue, consider the following image of an annotated face from the iBug W-300 dataset:

Since the mirror image of a face is still a face, we can mirror images like this to get more training data. However, what happens if you simply mirror the annotations? You end up with the wrong annotation labels! To see this, take a look at the figure below. The left image shows what happens if you naively mirror the above image and its landmarks. Note, for instance, that the points along the jawline are now annotated in reverse order. In fact, nearly all the annotations in the left image are wrong. Instead, you want to match the source image's labeling scheme. A mirrored image with the correct annotations is shown on the right.

Dlib's imglab tool has had a --flip option for a long time that would mirror a dataset for you. However, it used naive mirroring and it was left up to the user to adjust any landmark labels appropriately. Many users found this confusing, so in the new version of imglab (v1.13) the --flip command now performs automatic source label matching using a 2D point registration algorithm. That is, it left-right flips the dataset and annotations. Then it registers the mirrored landmarks with the original landmarks and transfers labels appropriately. In fact, the "source label matching" image on the right was created by the new version of imglab.

Finally, just to be clear, the point registration algorithm will work on anything. It doesn't have to be iBug's annotations. It doesn't have to be faces. It's a general point registration method that will work correctly for any kind of landmark annotated data with left-right symmetry. However, if you want the old --flip behavior you can use the new --flip-basic to get a naive mirroring. But most users will want to use the new --flip.