Tuesday, February 3, 2015

Python Stuff and Real-Time Video Object Tracking

The new version of dlib is out today. As promised, there is now a full Python API for using dlib's state-of-the-art object pose estimation and learning tools.  You can see examples of this API here and here.  Thank Patrick Snape, one of the main developers of the menpo project, for this addition.

Also, I've added an implementation of the winning algorithm from last year's Visual Object Tracking Challenge.  This was a method described in the paper:
Danelljan, Martin, et al. "Accurate scale estimation for robust visual tracking." Proceedings of the British Machine Vision Conference BMVC. 2014.
You can see some videos showing dlib's implementation of this new tracker in action on youtube:


All these videos were processed by exactly the same piece of software.  No hand tweaking or any funny business.  The only required input (other than the raw video) is a bounding box on the first frame and then the tracker automatically follows whatever is inside the box after that.  The whole thing runs at over 150fps on my desktop.  You can see an example program showing how to use it here, or just go download the new dlib instead :)

I've also finally posted the paper I've been writing on dlib's structural SVM based training algorithm, which is the algorithm behind the easy to use object detector.

88 comments :

Shervin Emami said...

Great addition, the object tracker seems quite robust in the video!

Davis King said...

Thanks! :)

Yeah, for objects that don't undergo rapid out of plane rotations it works pretty well.

Stefanelus said...

hey Davis,

I tried to test the tracker and when I execute the sample I get something like this:

Error detected in function void __thiscall dlib::matrix,struct dlib::row_major_layout>::set_size(
long,long).

Failing expression was (NR == 0 || NR == rows) && ( NC == 0 || NC == cols) && ro
ws >= 0 && cols >= 0.
void matrix::set_size(rows, cols)
You have supplied conflicting matrix dimensions
rows: 0
cols: 0
NR: 0
NC: 1
this: 008EFBF0

I used the frames provided in the library.

Best regards,
Stefan

Davis King said...

Oops. I have an assert statement triggering in debug mode that I need to fix. However, if you run it release mode it will work fine.

Running in debug mode is very slow anyway (http://dlib.net/faq.html#Why%20is%20dlib%20slow?)

Stefanelus said...

it did the trick, from the posted video the tracker looks really cool.

Stefanelus said...

Dear Davis,

I'm trying to classify some image patches(96 x 96) which are faces. I have a few images per subject, around 30 up to 50 patches.

I have played with the image recognition from OpenCV but is very sensitive to a lot of things.

My question is I can use a feature descriptor from dlib and then train a classifier. In the dlib I saw SURF and a few other feature descriptors.

What is your advices ? It will make sense to use some descriptors from dlib for image recognition ?

Best regards,
Stefan

Davis King said...

Sure, you can try using extract_highdim_face_lbp_descriptors(), extract_fhog_features(), or extract_uniform_lbp_descriptors() with a linear SVM. Any of those features generally give reasonable results for this kind of thing.

Stefanelus said...

many thanks, I'll gave it a try.

Unknown said...

Dear Davis,

I noticed the tracker run over at 150fps on your desktop, but in my testing it just have about 50fps.
Can you help me figure out where the problem occured?

Best,
Max

Davis King said...

Does this answer your question? http://dlib.net/faq.html#Whyisdlibslow

Unknown said...

Hello Davis, this seems interesting and the tracker looks robust, but what happens for example when in the next frame the object isn't in screen anymore? Do you get an error message? Or there is a way I can initialize the tracker again on other object I have detected? Thanks in advance!

Davis King said...

The correlation tracker doesn't deal with or detect any of those cases. To get an entire tracking system you must combine it with many other tools and how you do that depends on your application

Unknown said...

Thanks for the response!, somehow I was able to reset the tracking of the pedestrian when out of my region of interest and worked pretty well. Now I have another doubt: Sometimes on semi-occluded pedestrians the bounding box jumps from the tracked one to another. Can I improve the performance tweaking something on the algorithm or I have to use another aproach for those cases?

Davis King said...

The most common approach is to use something like this correlation tracker to generate short tracks (people call them tracklets). So you have to be able to identify when the tracker will fail so you can chop its output into tracklets. Then you use some additional processing to figure out which tracklets should associate together. There is a large literature on this. I would google for tracklet association and terms like that.

Personally, I would use http://dlib.net/ml.html#structural_assignment_trainer to perform tracklet association.

Unknown said...

How could I make my own landmark detector xml file so that I could train the program to detect cars for example?

Davis King said...

You can use the imglab program in the tools sub folder to label images.

Unknown said...

Is there a python version to imglab?

Davis King said...

Imglab is a graphical program. You don't need to look at its source code to use it so it doesn't matter what language it's written in.

Unknown said...

When do you expect that this will be usable from Python?

Davis King said...

Right now :)

see https://github.com/davisking/dlib/blob/master/python_examples/correlation_tracker.py

Unknown said...

Thank you! Nice to be able to objection detection AND tracking from Python.

Unknown said...

@Davis King, Could you make a .exe for the imglab program so that everyone could run it? If not, is there any similar program??

Davis King said...

Why not compile it yourself? You just run cmake in the folder and it will shoot out the exe.

Unknown said...

I have no experience with C++ and I've tried to compile it but I get errors. I also have a problem with correlation tracker.py I receive a no module error.

Davis King said...

The README.txt file in tools/imglab tells you exactly what to type to compile it. It should have worked. What happened when you tried those commands?

Unknown said...

I believe there were some import errors. How would I solve the tracker correlation problem.

AttributeError: 'module' object has no attribute 'correlation_tracker'

Davis King said...

Did you compile the dlib library code from https://github.com/davisking/dlib?

The correlation tracker was only added to the python interface a few days ago so if you are trying to use an older version it won't work.

Unknown said...

I recieve an error regard cl in cmake .. when compiling imglab.

Unknown said...

My goal is to blur all heads in any police body camera video. My thought is to do head detection and then track forwards and backwards any detection with this real-time video object tracking script. Hopefully then it won't miss much. Any suggestions for making it efficient? Is there a better way than running the script per each detection?

Davis King said...

I would just run the face detector on each frame and not worry about tracking.

Unknown said...

How do I deal with half a head etc then?

Davis King said...

The tracker might not work any better for partially occluded heads. You will just have to experiment and see what works.

Unknown said...

Shucks. I was hoping to be able to keep people's heads blurred as they leave the frame. Thank you for the quick responses.

Unknown said...

How to integrate the code with Live Video Stream....

Unknown said...

I ran tracking on the guy on left side of frame 1 of https://www.youtube.com/watch?v=F0HkplIekOQ When the officer walks away for a few frames the guy in frame 1 is never tracked again. Is there anyway to track in a situation like this without redrawing the rectangles each time? Doing head detection hasn't been much successful ran into too many instances of "killed" so gave up on training.

Unknown said...

Hi Davis,

Thanks to the wonderful algorithm, it looks attracting in the video.

I followed the instructions on the top of python version example and successfully compile the bat file. But still cannot import dlib and get the ERROR 'ImportError: No module named dlib'. I'm totally a beginner in either computer vision or python, could u please help me out of this problem? (BTW, I'm using mac os 10.10 and python 2.7)

Thanks~

Davis King said...

Did you run the python example by typing

python correlation_tracker.py

From within the python_examples folder?

Unknown said...

Thanks for the quick response. I solved the importing thing by moving dlib.so to the site-package file but another error arose: AttributeError: 'module' object has no attribute 'image_window', any suggestions?

I found some hints from GitHub says that Deleted examples build and recompiled, and now I can't recreate the error. But I'm kind of confused what should be deleted and what should be recompiled (maybe the compile_dlib_python_module.bat again) ?

Unknown said...

Sorry I missed your commend at the bottom of the GitHub issue, and now I can perfectly use the dlib as well as the correlation tracking method. I tried the tracking algorithm in my testing video and the performance is awesome !!

Unknown said...

hi,
im trying to build a camera which is able to detect people staying in one place for a long time... can I use dlib to detect multiple people from a libe video feed that is already subjected to background subtraction(backgroundsubtractorgmg) Thankyou...

Davis King said...

Yes, there are many tools useful for that in dlib. I would try training a HOG filter to find the people. See http://blog.dlib.net/2014/02/dlib-186-released-make-your-own-object.html for example.

Unknown said...

Thank you Mr.Davis for your advise.

Unknown said...

Hello, i have a problem with speed with the correlation tracker. I am using linux and i compiled the library as written in the compilation instructions (with release mode on). Anyhow, with the provided test example with resolution 320x480, i only get about 25-30 fps, which is far from 150 fps.
Is this because i am using dlib with Python API?

Thank you for your answer!

Davis King said...

No, the Python API isn't too much slower than the C++ API. Maybe your computer is super slow? Maybe you didn't really compile it in release mode. I don't know. I just tried it on my computer and I get 150fps in C++ including file I/O. In Python I'm getting 107fps but only because Python's image loading is much slower.

Dashesy said...

Any chance you submit dlib for PyPI, it makes the impact orders of magnitude higher, thanks for great utility.

Davis King said...

I'm not going to but you are welcome to do it if you are interested :)

Andrés Felipe said...

Hi Davis, I'm having problems when I try to compile the program in Eclipse.

make all
Building target: Tracker_dlib
Invoking: Cross G++ Linker
g++ -L/usr/lib/ -o "Tracker_dlib" ./src/Tracker_dlib.o -lpthread -lblas -llapack -ljpeg -lpng -lX11
/usr/bin/ld: cannot find -lblas
/usr/bin/ld: cannot find -llapack

However, when I check for libblas and liblabpack, these are the outputs that I get.


anfedres@anfedres-ThinkPad-W530:~/Documents/dlib-18.17/examples/build$ ldconfig -p | grep liblapack
liblapack.so.3 (libc6,x86-64) => /usr/lib/liblapack.so.3
anfedres@anfedres-ThinkPad-W530:~/Documents/dlib-18.17/examples/build$ ldconfig -p | grep libblas
libblas.so.3 (libc6,x86-64) => /usr/lib/libblas.so.3

I've added already /usr/lib/ to the library search path. Any clue which could be the problem?

Thank you.



Davis King said...

There are instructions for compiling dlib here http://dlib.net/compile.html. CMake can also generate an eclipse project if you really want to use eclipse.

Andrés Felipe said...

Solved, thank Davis.

Andrés Felipe said...

Is there a way to make the tracker recover if it is lost?. Thanks Davis.

Davis King said...

No, you will need to include it inside some larger tracking framework to deal with that sort of issue.

Unknown said...
This comment has been removed by the author.
Unknown said...

Hi! I have multiple (x,y) coordinates for multiple people in each frame. Is it possible to track multiple people using this tracker? I saw in one of your previous comments that you have suggested using HOG but I already have my coordinates and I just need to track those multiple targets.
I see that after we start tracking, all I can do is send my image to tracker.update().
Is it possible to see what tracker.update() is actually doing?

Davis King said...

Create multiple instances of the tracker, one for each object to track.

mces89 said...

Hi, I want to do some visual tracking of objects in the video, I'm a little bit confused why this kind of tracking does not need object detection? What's the difference for the following two ways?

1. object detection for each frame, and then do some post-processing
2. object detection for the first frame, then do what was done in dlib.

Thanks.

Chicharito said...

Hi,

Im trying to use dlib with Qt framework (http://www.qt.io/). How to push object QImage or QVideoFrame into correlation_tracker object?

Thanks!

Unknown said...

Nice video,I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
Regards,
Python Training in Chennai|Python Taining

Unknown said...

Hi Davis, Thanks for the nice video. May I ask if I/we can get the unboxed videos? So that I can test it.

Kind regards,
Tian

Unknown said...

Hi Davis,

First of all, thanks for your work! I would like to ask you a question. I am using the vot-toolkit tool to evaluate your correlation tracker (DSST) in order to compare your results with the ones obtained with the original DSST (Matlab). I get worse results with the VOT 2014 data set with your tracker than the original DSST. Why does this happen?

Thanks again! Have a nice day.

Davis King said...

I also ran dlib's version on the vot-toolkit and didn't get results as good as what was reported in the DSST paper. I'm not sure why that is but my guess is that there are additional things they did beyond what is reported in the paper. Maybe there is some reacquisition logic? I'm not sure.

Unknown said...

Hi Davis,

Yeah! I was examining the DLIB code (which I think It's really good) and compare It with the Matlab original version (https://github.com/gnebehay/DSST). And the most selectable difference that I encounter was that the DLIB version always uses square filters (64 x 64) but the Matlab version adapts the filter size to the patch size. Do you think that could be some important point?

Thanks!

Davis King said...

I tried it both ways when I was implementing it and there didn't seem to be any significant difference in accuracy between square vs. non-square filter shapes. It was a little bit faster and simpler to use square filters so I did it that way. Although you never know, maybe that's part of the difference.

Anguo Yang said...

Hi, davis, can this video object detector be used as a head counter?
as this this video:
https://www.youtube.com/watch?v=OWab2_ete7s

head area(camera above) could also be seen as one "type of" visual object.

eyebies said...
This comment has been removed by the author.
Unknown said...

Hi Davis,
are you open for custom job?I need to create an app which can track object. mike.sorochev@gmail.com

Drew Sun said...

Hello Davis.

First off, Kudos for the good work.

We are noticing that the ConfidenceLevel of tracker is high even when the subject has moved out of the video frame. Can you please confirm the range of values for the tracker confidence level we should be looking at. Should we be looking at some other parameter for continued tracking.

Thanks.

Davis King said...

The confidence value is only loosely correlated with track breaking. To get a good estimate of track failure you need to include additional machenry, what exactly depends on your application.

Drew Sun said...

In our tests, We are tracking faces of pedestrians. They walk in front of camera and move out of view. Can you please clarify what you mean by additional machenry.

Thanks for the quick response.

Davis King said...

I would run a face detector every few frames to make sure the objects are still present.

Drew Sun said...

We are running face detector every few frames.


Wondering how this would behave when one or more subjects tracked leave the scene and few others enter the field of view of camera Not sure if running just a face detector suffices in this case.

Can you please confirm if ConfidenceLevel is good indicator of effective tracking when the subject just moves within the frame. Should we be looking at some other indicators.

Would a delimiter defining the perimeter of the frame for effective tracking help? i.e., dont track beyond a predefined boundary or something similar.


Davis King said...

Those are good ideas and you will need to test them out when you develop your system to see what works. That is the only way to know.

dapper dan said...

Great work. Can I modify the tracking parameters in python or do I need to recompile dlib for each change?

Davis King said...

You have to use C++ to do that.

Unknown said...

Really cool!

Unknown said...

Hi Davis

Thanks for creating Dlib. I found it really useful.

I have a couple questions about correation_trakcer:

1) How can I obtain relevant values of tracking-rectangle (e.g., center position, width, height, etc.)? It seems that I need to do something with get_position() but I cannot get the value I want.

2) When an tracked-object moves fast, the tracker can lose the object. Is there any way to minimize this possibility? (I know that I need to detect the object again in case it is lost.)

Thanks in advance.

Celebnews Corner said...
This comment has been removed by the author.
Celebnews Corner said...

hi davis, i tried to use dlib to place face andmarks but it gives out this error henever i try to use shape_pridictor:

error : predictor = dlib.shape_predictor(predictor_path)
RuntimeError: Error deserializing a floating point number.
while deserializing a dlib::matrix
while deserializing object of type std::vector
while deserializing object of type std::vector
while deserializing object of type std::vector

i am using latest version on my raspberry pi.
i need to get this done because i have a presentation for my final year project please i need help !

Unknown said...
This comment has been removed by the author.
Unknown said...
This comment has been removed by the author.
Unknown said...

That is a good tracker. However, it is unfortunately not possible to track multiple objects.

In my case there is many tiny objects which I do not have any problems detecting them. However, when it comes to tracking. I have no clues how to do so.

how exactly one is able to create multiple tracker.update() while having not much speed performance degradation?
I mean how is it possible to use parallel processing in GPU track each object individually using parallel processing.
I have tried to reproduce such a system: https://www.youtube.com/watch?v=3IR1h6i31JI
so far detection is good, but tracking fails miserably.

YZ R said...

Hi Davis,

I am particular interested in the part as shown in the video from 3:32 - 3:44, where two person cross and intersect.

I have implement the tracker where I track faces, the tracker works very well with one face in the screen. When I tried to track two face where they crosses, when the tracked face is in the front it is still working good. However when the tracked face is in the back, the front face will 'bring' the tracker away and the tracked object will now become the face in the back.

I am running the tracker at around a 30fps video (real time from webcam). Unlike shown in the video where the tracker will still recognize the tracking object even when two person cross and intersects, regardless the tracking object is in the back or the front.

Is there any additional algorithm applied in order to achieve the performance as shown in the video? As my understanding is that the tracker algorithm will look at the closest pixel in the bounding box of current and subsequence frame, hence what I observed from my implementation should be correct.

Thank you.

Regards
YZ

Davis King said...

The video doesn't use any additional processing tricks. But in general this kind of algorithm will often, but not always, get confused if two similar looking objects briefly occlude each other. To make it more robust to this kind of thing you need to add some stronger appearance based features like pull out a face descriptor and use that to deal with track swaps. There is also an extended version of this algorithm that is better at disambiguating this kind of issue (http://openaccess.thecvf.com/content_cvpr_2017/papers/Mueller_Context-Aware_Correlation_Filter_CVPR_2017_paper.pdf) which was presented at last year's CVPR. I haven't added it to dlib yet though.

Unknown said...

Hi David, is possible to use a GPU with this algorithm ? for one up to 5 objects it's work fine. But add one more and all is going very slow.
Best regards.
Martín.

Unknown said...

Hi David, is possible to use a GPU with this algorithm ? for one up to 5 objects it's work fine. But add one more and all is going very slow.
Best regards.
Martín.

Davis King said...

There isn't any GPU accelerated version of this.

Jay said...

Hi Davis,
I am currently using Correlation tracker to track speed limit signs in the videos. The tracker works fine, however when the speed sign goes out of the image the tracker returns negative x and y values. I used

tracker.update(current_image)
cout<<tracker.get_position().left()<< " "<<tracker.get_position().top()<<" "<<tracker.get_position().right()<<" "<<tracker.get_position().bottom()<<endl;

I tested the tracker with three videos and I observe this behaviour whenever the speed sign goes out of the image. Please suggest whether this is an expected behaviour or not. Thanks.

Davis King said...

I assume it's going out of the image to the left or top? Those areas have negative coordinates, so this is expected.

Jay said...

Many thanks for confirming that negative coordinates are expected. As I drive, the speed signs in the captured video goes out of the image to the top and left. I now handled this exception in my code.