Friday, October 7, 2016

Hipsterize Your Dog With Deep Learning

I'm getting ready to make the next dlib release, which should be out in a few days, and I thought I would point out a humorous new example program.  The dog hipsterizer!

It uses dlib's new deep learning tools to detect dogs looking at the camera. Then it uses the dlib shape predictor to identify the positions of the eyes, nose, and top of the head. From there it's trivial to make your dog hip with glasses and a mustache :)

This is what you get when you run the dog hipsterizer on this awesome image:
Barkhaus dogs looking fancy


Steven said...

Loving it!

Rob Siegel said...

You broke the mold on this one. You made a million dollar app idea and gave it away for free. Does Baxter love it?

Davis King said...

Ha, making apps is boring. I have better things to do :) Baxter is conflicted as always though.

BlueValhalla said...

Awesome! It turns out the Dog Hipsterizer works on bears too:


We'll probably end up using your pre-trained model as part of our project to identify the bears of Brooks Falls, Alaska. You can read more about it at our blog:

Davis King said...

Lol, awesome.

It sounds like you also want to do recognition. I've just added some deep learning tooling to dlib for that. You can see the introductory example program for it here: and a more advanced example here:

I've used that tooling to make a state-of-the-art face recognition model which I'll post online in a few days too. So it definitely works :)

Ed Miller said...

Yes, you are right, recognition is our aim. We've been roughly following the structure of FaceNet, and so far dlib has met all our needs. With this metric learning example, it looks like we can do the whole project using dlib. That will save us from having to fire up one of the more complicated neural net frameworks. Of course we have no idea if any of these networks will learn to differentiate between individual bears, but we're hopeful.

Thanks a million for dlib and for all the great examples!

Ed Miller said...

Hi Davis,

For the dnn_metric_learning_on_images_ex.cpp example you are working on, do you expect the input images to be face crops where the face has been transformed to be centered? I looked at the johns directory on github, but I'm not 100% sure if any are transformed.

How many "measurements" are there in the example embedding?

What sort of hardware do you run on? I think I've seen a mention of Titan X on some of your examples.

Davis King said...

You could use the metric learning with anything, not just faces. But generally speaking, anything you can do to normalize out irrelevant changes in your inputs is always good. For faces that means aligning them to some standard pose since the pose is irrelevant to identity. The johns in the folder are obviously centered and cropped in the same way. I would in general try to normalize your data as much as possible.

I don't know what you mean by "measurements".

I use a titan x.

Ed Miller said...

Thanks for the feedback!

By "measurements" I mean the embedding dimensionality. For example, I believe the embedding for FaceNet was 128 floats per face. I see now that your FC layer is 128, so I guess that's the same.

Ed Miller said...

By the way, how long did it take to train the dnn_metric_learning_on_images_ex on your Titan X system and how big was the training set (or was it only the examples/johns)?

Davis King said...

Ah, yes, my model is also 128D.

Training took about 2 days and the training dataset is about 3 million images. I suspect you don't have 3 million images of bears but you might be able to get by using the human face model, or by doing fine tuning of the human face model with a smaller bear dataset, or even bootstrap a big dataset from nature videos of bears (and probably dog videos since they are so similar) to train a bear face recognizer.

Also, the johns dataset is trivially small, too small for any practical purpose. It's just there to make the example program runnable and illustrate the API. This is true of all the example programs in dlib. Their purpose is to educate, not to be usable applications.

Ed Miller said...

You're right. We don't have 3 million images of known bears. Pulling them from videos might get us there, but I'll have to make sure we know which bears are in the videos. If it turns out the same individual bear showed up in different sets without our knowledge, it would throw off the training since we would be telling the network it's 2 different bears when in fact it is the same bear. Still, if I use videos from different geographies, I can probably assume they are different bears.

I hadn't thought to try the human face model directly. You never know. I was surprised by the Dog Hipsterizer's ability to work with bears! Although I think the human face embedding working for bears is less likely.

I think transfer learning with tuning is our best chance. I had been planning to use a CNN that was trained for ImageNet and replace the FC layer. Perhaps we will first try with the face model and fine tune.

Do you train the entire metric learning network from random using only faces? If so, starting with the ResNet-34 you trained for ImageNet may be a better starting point since it already has bears in the data set (and other animals).

If you don't want me to pollute your blog comments, you can contact me at ed at hypraptive dot com. :)

Thanks for all your help!

Davis King said...

Well, be prepared to spend a lot of time manually reviewing and fixing whatever dataset you make. Hopefully you can tell the difference between bears with your own eyes or it's going to be hard.

I trained the entire network from scratch in one shot.

I have some doubts that initializing this with an ImageNet trained network will help, but you never know until you try.

Davis King said...

Just posted the face recognition model and example program:

Совака Улыбака said...

Hi, Davis! I think I found error in file image_pyramid.h

ptype temp = temp_img[r-2][c] +
temp_img[r-1][c]*4 +
temp_img[r ][c]*6 +
temp_img[r-1][c]*4 + // <--- must be +1
temp_img[r-2][c]; // <--- must be +2

Thank you for your code!

Davis King said...

Oh yeah, good catch. Just fixed it.