You would begin by importing dlib and scikit-image:
import dlib from skimage import ioThen you load dlib's default face detector, the image of Obama, and then invoke the detector on the image:
detector = dlib.get_frontal_face_detector() img = io.imread('obama.jpg') faces = detector(img)The result is an array of boxes called faces. Each box gives the pixel coordinates that bound each detected face. To get these coordinates out of faces you do something like:
for d in faces: print "left,top,right,bottom:", d.left(), d.top(), d.right(), d.bottom()We can also view the results graphically by running:
win = dlib.image_window() win.set_image(img) win.add_overlay(faces)
But what if you wanted to create your own object detector? That's easy too. Dlib comes with an example program and a sample training dataset showing how to this. But to summarize, you do:
options = dlib.simple_object_detector_training_options() options.C = 5 # Set the SVM C parameter to 5. dlib.train_simple_object_detector("training.xml","detector.svm", options)That will run the trainer and save the learned detector to a file called detector.svm. The training data is read from training.xml which contains a list of images and bounding boxes. The example that comes with dlib shows the format of the XML file. There is also a graphical tool included that lets you mark up images with a mouse and save these XML files. Finally, to load your custom detector you do:
detector = dlib.simple_object_detector("detector.svm")If you want to try it out yourself you can download the new dlib release here.
60 comments :
This library looks amazing!
How can I go and install it? I cannot seem to find a good tutorial.
The comment at the top of each python example tells you what to do to compile the library.
Hello and thanks for your work.
I tried to train a detector with the .py file you provided. It works well on about 10 images (each about 2000x2000, jpg), but it fails with "Memory Error" on more than 10 images.
Sorry if the solution to this problem is obvious.
OS: Windows 7 64bit (using 32bit Python 2.7)
Oh guess I forget an actual question: do you know why exactly this error occours and how I can prevent it while still training on more images? My goal is to train on some hundreds of images each of the same size.
I used the imglab exe to make the file with the boxes. while running the code to build the svm file on certain occasions it fails somewhere so i checked i changed the width and the height to random value it worked but that will increase the chances of misclassifications. How is it the bounding boxes are affecting this process of training?
What happens when it fails? Is there an error message?
Hi Davis,
Theres absolutely no error message the last check point is when it counts the no of images and then the crash
so is there a certain aspect ratio to maintained while drawing the bounding box over the object? because certain occasions the default window size 80 x 80 does not seem to work unless changed to 50 x 50. What features should be common? similar height, width , aspect ratio , area etc..
There is no error message at all? What happens? The program terminates and nothing is output to disk or the screen?
You should try to make all your boxes have a similar aspect ratio.
There is absolutely no message on the screen just crashes . i think most of the boxes are made to maintain the aspect ratio. I can share the xml with you if you wish to analyse it?
Sure, if you can post a complete program that demonstrates the error you are seeing that would be great.
How do I save the image to a file? I don't have a GUI.
I am using evaluate_detectors(), how do i know which detector has returned the true value for the rectangle.?
Thanks
The documentation for evaluate_detectors() tells you how: http://dlib.net/dlib/image_processing/scan_fhog_pyramid_abstract.h.html#evaluate_detectors
Managed to complete the entire thing the only thing that is stopping me is this
I have added the entire training into a function, the training happens fine everything is ok it generates the detector but just crashes at the function exit point. any idea about that?
Tried everything I have a hunch that the thread (dlib::Svm_thread) is not getting released may be. could that be the issue? if so how do i ask the function to wait for the thread to be finished?
Do the example programs run without crashing if you don't modify them? If yes then there is probably a bug in your code, not in dlib.
I'm using Dlib to redact people's heads from body camera footage to post at https://www.youtube.com/channel/UCcdSPRNt1HmzkTL9aSDfKuA Should I be making different svm files for the various head positions? How many different videos do I need to train on in order to create a very reliable head detector?
If you have heads it isn't detecting then yes, you need to train more models for those head poses. A few hundred examples is usually sufficient for the training to give quite good results.
Hello Mr. King,
I have two questions:
i) do I have independent control over the width and height of the detection_window_size? I could not set a tuple to this option and I need the detection area to be a non-quadratic rectangle
ii) do I have control over the pyramid size? For a current project, I don't need/want to apply the algorithm on different scales
I tried experimenting and reading the docs, so I suspect the answer is 'no' to both questions. Since these options are available in the C++ implementation: would it be much work to re-compile the c++ code to get a new dlib.pyd file which uses the needed options?
Thanks for your time.
The python interface picks the best aspect ratio for the detection window based on your training data. So if most of your training boxes are two times as tall as wide then the detection window will be like that too.
If you want more control then you need to use the C++ API rather than trying to modify the python API as that is a lot of work. I mean, you can, but if you have enough ability to modify the underlying C++->Python API implementation then you can just work in C++ in a fraction of the time.
Hello. Looks cool. I tried following along but am too dumb to install dlib so that python import works.
I followed the usual install instructions as far as
cmake --build . --config Release
which seemed to work but Python remains unaware. Any ideas or is there an idiots guide as to how to do this?
Ta
Opps - just saw the comment to read the python examples - I'll try that
Hello Mr. King,
can you elaborate on the .svm file that is produced (and re-used) by the object detector/trainer?
i) What information is stored in this file?
ii) How can I read and modify it?
iii) Is this exact .svm file compatible with the pure C++ implementation and therefore also usable by this (e.g. if I want to train in python but someday decide to switch to C++)?
iv) Am I dependent on dlib or can I somehow access the svm parameters which are stored in that file (and therefore use it with another SVM module)?
Thanks for your time.
The python code is just a wrapper around dlib's C++ code. So you can load and use the object detectors without issue in C++.
The file isn't somehow encrypted, so you can read the values out of it and do whatever you want if you were motivated to write your own processing code. It is however highly technical, but all the details are documented in the main C++ side of dlib and in this paper: http://arxiv.org/abs/1502.00046
How do I go about debugging the script being killed very early on in training?
Is there an error message?
No there isn't. The output is
Training with C: 5
Training with epsilon: 0.01
Training using 2 threads.
Training with sliding window 79 pixels wide by 81 pixels tall.
Training on both left and right flipped versions of images.
Upsample images...
Upsample images...
Killed
Maybe it ran out of RAM. How many images did you give it? Are you compiling a 32bit executable and therefore only able to use 2GB of RAM?
I gave it 64 images. I have 3GB and am running Linux. On Linux I don't have to force it to compile a 64bit executable right?
If you are using 64bit linux then everything is just always in 64bits so there isn't anything you need to do to use all the RAM.
I don't know what's happening. Does the trainer work when you run it without modification on the training data that comes with dlib?
I upgraded to 4 cores and 7 RAM on Azure and now get the below error. Normally when this happens it tells me the filename. It does work with the examples and works off and on with my own images.
I'm having trouble figuring out which image this is.
image index 83
match_eps: 0.5
best possible match: 0.488811
truth rect: [(724, 6) (968, 150)]
truth rect width/height: 1.68966
truth rect area: 35525
nearest detection template rect: [(773, -23) (923, 143)]
nearest detection template rect width/height: 0.904192
nearest detection template rect area: 25217
It is the image at index 83 in the list of images you gave to the training code.
Does the indexing start at 0 or 1? I'm going to write a script to tell me the filename of a video by index.
It starts at 0
Can i use the dlib training method to detect people by feeding it pictures of people body shape etc. and track them using a RASPBERRY PI? would it have enough power to do tracking using dlib in real time?
Tracking is a little bit more than just detection. You might want to use dlib's Real Time Video Object Tracking: http://blog.dlib.net/2015/02/dlib-1813-released.html
how to put text on the top of the detection rectangle
hi davis,
how do i reuse .svm generated by hog_object_detector. I am using visual studio 12 as compiler.
Thank you Davis
I have created my original detector.
that's result is
Trained with C: 5
Training accuracy: precision: 0.991111, recall: 0.771626, average precision: 0.769863
Testing accuracy: precision: 0.986111, recall: 0.731959, average precision: 0.723225
Trained with C: 10
Training accuracy: precision: 0.991701, recall: 0.82699, average precision: 0.82468
Testing accuracy: precision: 0.975309, recall: 0.814433, average precision: 0.804037
Trained with C: 20
Training accuracy: precision: 0.992248, recall: 0.885813, average precision: 0.883479
Testing accuracy: precision: 0.976744, recall: 0.865979, average precision: 0.854488
Trained with C: 25
Training accuracy: precision: 0.996169, recall: 0.899654, average precision: 0.897599
Testing accuracy: precision: 0.977011, recall: 0.876289, average precision: 0.864523
Trained with C: 30
Training accuracy: precision: 0.996212, recall: 0.910035, average precision: 0.908016
Testing accuracy: precision: 0.967033, recall: 0.907216, average precision: 0.894226
Trained with C: 40
Training accuracy: precision: 0.996255, recall: 0.920415, average precision: 0.918458
Testing accuracy: precision: 0.967033, recall: 0.907216, average precision: 0.895631
Trained with C: 50
Training accuracy: precision: 0.99631, recall: 0.934256, average precision: 0.932443
Testing accuracy: precision: 0.967391, recall: 0.917526, average precision: 0.904212
Trained with C: 100
Training accuracy: precision: 0.996377, recall: 0.951557, average precision: 0.949977
Testing accuracy: precision: 0.9375, recall: 0.927835, average precision: 0.913309
I think C: 30 is the best
What do you think about this?
Which is best really depends on your application, and in particular, how much you care about different types of errors.
Tuning is difficult for me
Thank you Davis
i am using the Python example, to train a custom object (road sign), but the detection window draws a bigger arbitrary box around the detection area. I thought it would be accurate and just draw it exactly around the matching object. obviously something has gone wrong with the training. has anyone else experienced this before. I resized all boxes to 80x80 and set my detection size to 6400.
hi,if I use more training data to train an object detector, does the detection time will be longer than before? For example, I use 50 people to train detector V1, and I use 100 people to train detector V2. Then I use V1 and V2 to detector face, I want to know if the detection time is the same? Many thanks.
The detection time is always the same.
how about training time, how different would that be?
As you would expect, more training data makes training take longer.
Hi Davis;
Is it possible to use images that contain any target object -so no box in xml for this image- in training?
Not all images in the training data need labels. Any part of any image that doesn't have a box on it is treated as negative data and the algorithm will learn to not put boxes there.
How to convert SVM file to DAT file extension?
Nothing in dlib cares about the file extension.
Hi Davis,
get_frontal_face_detector is based on HOG features and a linear classifier (SVM). You call get_frontal_face_detector in face detection programs without deserializing the previous SVM training results. I wonder how get_frontal_face_detector works without training data.
Hi Davis
I am working on blind spot detection problem of a vehicle. So, I want to detect cars, motorbikes, pedestrians or any vulnerable vehicles while changing lanes. So, I thought of detecting vehicles using dlib. We wanted to try HOG + SVM detector. I tried using detection window size - 80x80, 60x60, 40x40 etc and also changed pyramid param from 6 to 12. But, it always produces errors as shown below. So, I think the problem is with varying aspect ratios. So, I get errors like below with an exception -
An impossible set of object labels was detected. This is happening because none
of the object locations checked by the supplied image scanner is a close enough
match to one of the truth boxes in your training dataset. To resolve this you
need to either lower the match_eps, adjust the settings of the image scanner so
that it is capable of hitting this truth box, or adjust the offending truth
rectangle so it can be matched by the current image scanner. Also, if you are
using the scan_fhog_pyramid object then you could try using a finer image
pyramid. Additionally, the scan_fhog_pyramid scans a fixed aspect ratio box
across the image when it searches for objects. So if you are getting this error
and you are using the scan_fhog_pyramid, it's very likely the problem is that
your training dataset contains truth rectangles of widely varying aspect
ratios. The solution is to make sure your training boxes all have about the
same aspect ratio.
image index 2
match_eps: 0.5
best possible match: 0.457987
truth rect: [(561, 484) (621, 566)]
truth rect width/height: 0.73494
truth rect area: 5063
nearest detection template rect: [(572, 492) (652, 572)]
nearest detection template rect width/height: 1
nearest detection template rect area: 6561
Would you be kind enough to tell me what does match_eps stands for. I could just understand that its the 3rd image in order, but what do other parameters represent ? So, could you please suggest me how to go ahead to the problem ? We wanted to do it on CPU preferably, if possible
Hi Davis,
In my training data with images, I have six classes which includes a background class (negative class). I would like to know whether it is possible to obtain Multiclass SVM probabilities in dlib. I want the SVM to output not only the class labels but also it's confidence value. Please help me in this regard.
The output includes the SVM confidence values. Consult the documentation to see how to get it.
Many thanks for the reply.
Hello David,
First, thank you so much for your work it is really helpful in many ways.
I am trying to retrain the face detector on some thermal images. To do so I am using the Python code train_object_detector.py and I am actually having some issue with the dlib.train_simple_object_detector() function.
My first goal was to train it on 5000 images of dimension 160x120 pixels.
But I have been having some RAM issues. I try to resized the images but then the bounding boxes were too small "smaller than about 400
pixels in area".
So I found out that 500 images were the maximum I could use.
So now I am training on those 500 images and I am always getting:
Training accuracy: precision: 1, recall: 0, average precision: 0
Testing accuracy: precision: 1, recall: 0, average precision: 0
Do you have any idea of what could be wrong in what I am doing?
Thanks a lot
Your labels are most likely inaccurate or inconsistent in some way. Train on a smaller dataset that you are sure is labeled the way you really want. Get it working on that, then run that resulting model on the other images and see where it disagrees with labels. Or add more images but review them to make sure the boxes are in the right places.
Post a Comment