Wednesday, April 9, 2014

Dlib 18.7 released: Make your own object detector in Python!

A while ago I boasted about how dlib's object detection tools are better than OpenCV's. However, one thing OpenCV had on dlib was a nice Python API, but no longer!  The new version of dlib is out and it includes a Python API for using and creating object detectors. What does this API look like? Well, lets start by imagining you want to detect faces in this image:


You would begin by importing dlib and scikit-image:
import dlib
from skimage import io
Then you load dlib's default face detector, the image of Obama, and then invoke the detector on the image:
detector = dlib.get_frontal_face_detector()
img = io.imread('obama.jpg')
faces = detector(img)
The result is an array of boxes called faces. Each box gives the pixel coordinates that bound each detected face. To get these coordinates out of faces you do something like:
for d in faces:
    print "left,top,right,bottom:", d.left(), d.top(), d.right(), d.bottom()
We can also view the results graphically by running:
win = dlib.image_window()
win.set_image(img)
win.add_overlay(faces)

But what if you wanted to create your own object detector?  That's easy too.  Dlib comes with an example program and a sample training dataset showing how to this.  But to summarize, you do:
options = dlib.simple_object_detector_training_options()
options.C = 5  # Set the SVM C parameter to 5.  
dlib.train_simple_object_detector("training.xml","detector.svm", options)
That will run the trainer and save the learned detector to a file called detector.svm. The training data is read from training.xml which contains a list of images and bounding boxes. The example that comes with dlib shows the format of the XML file. There is also a graphical tool included that lets you mark up images with a mouse and save these XML files. Finally, to load your custom detector you do:
detector = dlib.simple_object_detector("detector.svm")
If you want to try it out yourself you can download the new dlib release here.

60 comments :

  1. This library looks amazing!

    How can I go and install it? I cannot seem to find a good tutorial.

    ReplyDelete
  2. The comment at the top of each python example tells you what to do to compile the library.

    ReplyDelete
  3. Hello and thanks for your work.

    I tried to train a detector with the .py file you provided. It works well on about 10 images (each about 2000x2000, jpg), but it fails with "Memory Error" on more than 10 images.
    Sorry if the solution to this problem is obvious.

    OS: Windows 7 64bit (using 32bit Python 2.7)

    ReplyDelete
  4. Oh guess I forget an actual question: do you know why exactly this error occours and how I can prevent it while still training on more images? My goal is to train on some hundreds of images each of the same size.

    ReplyDelete
  5. I used the imglab exe to make the file with the boxes. while running the code to build the svm file on certain occasions it fails somewhere so i checked i changed the width and the height to random value it worked but that will increase the chances of misclassifications. How is it the bounding boxes are affecting this process of training?

    ReplyDelete
  6. What happens when it fails? Is there an error message?

    ReplyDelete
  7. Hi Davis,

    Theres absolutely no error message the last check point is when it counts the no of images and then the crash

    ReplyDelete
  8. so is there a certain aspect ratio to maintained while drawing the bounding box over the object? because certain occasions the default window size 80 x 80 does not seem to work unless changed to 50 x 50. What features should be common? similar height, width , aspect ratio , area etc..

    ReplyDelete
  9. There is no error message at all? What happens? The program terminates and nothing is output to disk or the screen?

    You should try to make all your boxes have a similar aspect ratio.

    ReplyDelete
  10. There is absolutely no message on the screen just crashes . i think most of the boxes are made to maintain the aspect ratio. I can share the xml with you if you wish to analyse it?

    ReplyDelete
  11. Sure, if you can post a complete program that demonstrates the error you are seeing that would be great.

    ReplyDelete
  12. How do I save the image to a file? I don't have a GUI.

    ReplyDelete
  13. I am using evaluate_detectors(), how do i know which detector has returned the true value for the rectangle.?

    Thanks

    ReplyDelete
  14. The documentation for evaluate_detectors() tells you how: http://dlib.net/dlib/image_processing/scan_fhog_pyramid_abstract.h.html#evaluate_detectors

    ReplyDelete
  15. Managed to complete the entire thing the only thing that is stopping me is this

    I have added the entire training into a function, the training happens fine everything is ok it generates the detector but just crashes at the function exit point. any idea about that?

    Tried everything I have a hunch that the thread (dlib::Svm_thread) is not getting released may be. could that be the issue? if so how do i ask the function to wait for the thread to be finished?

    ReplyDelete
  16. Do the example programs run without crashing if you don't modify them? If yes then there is probably a bug in your code, not in dlib.

    ReplyDelete
  17. I'm using Dlib to redact people's heads from body camera footage to post at https://www.youtube.com/channel/UCcdSPRNt1HmzkTL9aSDfKuA Should I be making different svm files for the various head positions? How many different videos do I need to train on in order to create a very reliable head detector?

    ReplyDelete
  18. If you have heads it isn't detecting then yes, you need to train more models for those head poses. A few hundred examples is usually sufficient for the training to give quite good results.

    ReplyDelete
  19. Hello Mr. King,
    I have two questions:

    i) do I have independent control over the width and height of the detection_window_size? I could not set a tuple to this option and I need the detection area to be a non-quadratic rectangle

    ii) do I have control over the pyramid size? For a current project, I don't need/want to apply the algorithm on different scales

    I tried experimenting and reading the docs, so I suspect the answer is 'no' to both questions. Since these options are available in the C++ implementation: would it be much work to re-compile the c++ code to get a new dlib.pyd file which uses the needed options?

    Thanks for your time.

    ReplyDelete
  20. The python interface picks the best aspect ratio for the detection window based on your training data. So if most of your training boxes are two times as tall as wide then the detection window will be like that too.

    If you want more control then you need to use the C++ API rather than trying to modify the python API as that is a lot of work. I mean, you can, but if you have enough ability to modify the underlying C++->Python API implementation then you can just work in C++ in a fraction of the time.

    ReplyDelete
  21. Hello. Looks cool. I tried following along but am too dumb to install dlib so that python import works.

    I followed the usual install instructions as far as

    cmake --build . --config Release

    which seemed to work but Python remains unaware. Any ideas or is there an idiots guide as to how to do this?

    Ta

    ReplyDelete
  22. Opps - just saw the comment to read the python examples - I'll try that

    ReplyDelete
  23. Hello Mr. King,
    can you elaborate on the .svm file that is produced (and re-used) by the object detector/trainer?

    i) What information is stored in this file?
    ii) How can I read and modify it?
    iii) Is this exact .svm file compatible with the pure C++ implementation and therefore also usable by this (e.g. if I want to train in python but someday decide to switch to C++)?
    iv) Am I dependent on dlib or can I somehow access the svm parameters which are stored in that file (and therefore use it with another SVM module)?

    Thanks for your time.

    ReplyDelete
  24. The python code is just a wrapper around dlib's C++ code. So you can load and use the object detectors without issue in C++.

    The file isn't somehow encrypted, so you can read the values out of it and do whatever you want if you were motivated to write your own processing code. It is however highly technical, but all the details are documented in the main C++ side of dlib and in this paper: http://arxiv.org/abs/1502.00046

    ReplyDelete
  25. How do I go about debugging the script being killed very early on in training?

    ReplyDelete
  26. No there isn't. The output is
    Training with C: 5
    Training with epsilon: 0.01
    Training using 2 threads.
    Training with sliding window 79 pixels wide by 81 pixels tall.
    Training on both left and right flipped versions of images.
    Upsample images...
    Upsample images...
    Killed

    ReplyDelete
  27. Maybe it ran out of RAM. How many images did you give it? Are you compiling a 32bit executable and therefore only able to use 2GB of RAM?

    ReplyDelete
  28. I gave it 64 images. I have 3GB and am running Linux. On Linux I don't have to force it to compile a 64bit executable right?

    ReplyDelete
  29. If you are using 64bit linux then everything is just always in 64bits so there isn't anything you need to do to use all the RAM.

    I don't know what's happening. Does the trainer work when you run it without modification on the training data that comes with dlib?

    ReplyDelete
  30. I upgraded to 4 cores and 7 RAM on Azure and now get the below error. Normally when this happens it tells me the filename. It does work with the examples and works off and on with my own images.

    I'm having trouble figuring out which image this is.

    image index 83
    match_eps: 0.5
    best possible match: 0.488811
    truth rect: [(724, 6) (968, 150)]
    truth rect width/height: 1.68966
    truth rect area: 35525
    nearest detection template rect: [(773, -23) (923, 143)]
    nearest detection template rect width/height: 0.904192
    nearest detection template rect area: 25217

    ReplyDelete
  31. It is the image at index 83 in the list of images you gave to the training code.

    ReplyDelete
  32. Does the indexing start at 0 or 1? I'm going to write a script to tell me the filename of a video by index.

    ReplyDelete
  33. This comment has been removed by the author.

    ReplyDelete
  34. Can i use the dlib training method to detect people by feeding it pictures of people body shape etc. and track them using a RASPBERRY PI? would it have enough power to do tracking using dlib in real time?

    ReplyDelete
  35. Tracking is a little bit more than just detection. You might want to use dlib's Real Time Video Object Tracking: http://blog.dlib.net/2015/02/dlib-1813-released.html

    ReplyDelete
  36. how to put text on the top of the detection rectangle

    ReplyDelete
  37. hi davis,
    how do i reuse .svm generated by hog_object_detector. I am using visual studio 12 as compiler.

    ReplyDelete
  38. Thank you Davis
    I have created my original detector.
    that's result is

    Trained with C: 5
    Training accuracy: precision: 0.991111, recall: 0.771626, average precision: 0.769863
    Testing accuracy: precision: 0.986111, recall: 0.731959, average precision: 0.723225

    Trained with C: 10
    Training accuracy: precision: 0.991701, recall: 0.82699, average precision: 0.82468
    Testing accuracy: precision: 0.975309, recall: 0.814433, average precision: 0.804037

    Trained with C: 20
    Training accuracy: precision: 0.992248, recall: 0.885813, average precision: 0.883479
    Testing accuracy: precision: 0.976744, recall: 0.865979, average precision: 0.854488

    Trained with C: 25
    Training accuracy: precision: 0.996169, recall: 0.899654, average precision: 0.897599
    Testing accuracy: precision: 0.977011, recall: 0.876289, average precision: 0.864523

    Trained with C: 30
    Training accuracy: precision: 0.996212, recall: 0.910035, average precision: 0.908016
    Testing accuracy: precision: 0.967033, recall: 0.907216, average precision: 0.894226

    Trained with C: 40
    Training accuracy: precision: 0.996255, recall: 0.920415, average precision: 0.918458
    Testing accuracy: precision: 0.967033, recall: 0.907216, average precision: 0.895631

    Trained with C: 50
    Training accuracy: precision: 0.99631, recall: 0.934256, average precision: 0.932443
    Testing accuracy: precision: 0.967391, recall: 0.917526, average precision: 0.904212

    Trained with C: 100
    Training accuracy: precision: 0.996377, recall: 0.951557, average precision: 0.949977
    Testing accuracy: precision: 0.9375, recall: 0.927835, average precision: 0.913309

    I think C: 30 is the best
    What do you think about this?

    ReplyDelete
  39. Which is best really depends on your application, and in particular, how much you care about different types of errors.

    ReplyDelete
  40. This comment has been removed by the author.

    ReplyDelete
  41. Tuning is difficult for me
    Thank you Davis

    ReplyDelete
  42. i am using the Python example, to train a custom object (road sign), but the detection window draws a bigger arbitrary box around the detection area. I thought it would be accurate and just draw it exactly around the matching object. obviously something has gone wrong with the training. has anyone else experienced this before. I resized all boxes to 80x80 and set my detection size to 6400.

    ReplyDelete
  43. hi,if I use more training data to train an object detector, does the detection time will be longer than before? For example, I use 50 people to train detector V1, and I use 100 people to train detector V2. Then I use V1 and V2 to detector face, I want to know if the detection time is the same? Many thanks.

    ReplyDelete
  44. The detection time is always the same.

    ReplyDelete
  45. how about training time, how different would that be?

    ReplyDelete
  46. As you would expect, more training data makes training take longer.

    ReplyDelete
  47. Hi Davis;

    Is it possible to use images that contain any target object -so no box in xml for this image- in training?

    ReplyDelete
  48. Not all images in the training data need labels. Any part of any image that doesn't have a box on it is treated as negative data and the algorithm will learn to not put boxes there.

    ReplyDelete
  49. How to convert SVM file to DAT file extension?

    ReplyDelete
  50. Nothing in dlib cares about the file extension.

    ReplyDelete
  51. Hi Davis,
    get_frontal_face_detector is based on HOG features and a linear classifier (SVM). You call get_frontal_face_detector in face detection programs without deserializing the previous SVM training results. I wonder how get_frontal_face_detector works without training data.

    ReplyDelete
  52. This comment has been removed by the author.

    ReplyDelete
  53. Hi Davis

    I am working on blind spot detection problem of a vehicle. So, I want to detect cars, motorbikes, pedestrians or any vulnerable vehicles while changing lanes. So, I thought of detecting vehicles using dlib. We wanted to try HOG + SVM detector. I tried using detection window size - 80x80, 60x60, 40x40 etc and also changed pyramid param from 6 to 12. But, it always produces errors as shown below. So, I think the problem is with varying aspect ratios. So, I get errors like below with an exception -

    An impossible set of object labels was detected. This is happening because none
    of the object locations checked by the supplied image scanner is a close enough
    match to one of the truth boxes in your training dataset. To resolve this you
    need to either lower the match_eps, adjust the settings of the image scanner so
    that it is capable of hitting this truth box, or adjust the offending truth
    rectangle so it can be matched by the current image scanner. Also, if you are
    using the scan_fhog_pyramid object then you could try using a finer image
    pyramid. Additionally, the scan_fhog_pyramid scans a fixed aspect ratio box
    across the image when it searches for objects. So if you are getting this error
    and you are using the scan_fhog_pyramid, it's very likely the problem is that
    your training dataset contains truth rectangles of widely varying aspect
    ratios. The solution is to make sure your training boxes all have about the
    same aspect ratio.

    image index 2
    match_eps: 0.5
    best possible match: 0.457987
    truth rect: [(561, 484) (621, 566)]
    truth rect width/height: 0.73494
    truth rect area: 5063
    nearest detection template rect: [(572, 492) (652, 572)]
    nearest detection template rect width/height: 1
    nearest detection template rect area: 6561

    Would you be kind enough to tell me what does match_eps stands for. I could just understand that its the 3rd image in order, but what do other parameters represent ? So, could you please suggest me how to go ahead to the problem ? We wanted to do it on CPU preferably, if possible

    ReplyDelete
  54. Hi Davis,
    In my training data with images, I have six classes which includes a background class (negative class). I would like to know whether it is possible to obtain Multiclass SVM probabilities in dlib. I want the SVM to output not only the class labels but also it's confidence value. Please help me in this regard.


    ReplyDelete
  55. The output includes the SVM confidence values. Consult the documentation to see how to get it.

    ReplyDelete
  56. Hello David,
    First, thank you so much for your work it is really helpful in many ways.

    I am trying to retrain the face detector on some thermal images. To do so I am using the Python code train_object_detector.py and I am actually having some issue with the dlib.train_simple_object_detector() function.

    My first goal was to train it on 5000 images of dimension 160x120 pixels.

    But I have been having some RAM issues. I try to resized the images but then the bounding boxes were too small "smaller than about 400
    pixels in area".
    So I found out that 500 images were the maximum I could use.
    So now I am training on those 500 images and I am always getting:

    Training accuracy: precision: 1, recall: 0, average precision: 0
    Testing accuracy: precision: 1, recall: 0, average precision: 0

    Do you have any idea of what could be wrong in what I am doing?

    Thanks a lot

    ReplyDelete
  57. Your labels are most likely inaccurate or inconsistent in some way. Train on a smaller dataset that you are sure is labeled the way you really want. Get it working on that, then run that resulting model on the other images and see where it disagrees with labels. Or add more images but review them to make sure the boxes are in the right places.

    ReplyDelete