dlib C++ Library: Python Stuff and Real-Time Video Object Tracking

Tuesday, February 3, 2015

Python Stuff and Real-Time Video Object Tracking

The new version of dlib is out today. As promised, there is now a full Python API for using dlib's state-of-the-art object pose estimation and learning tools. You can see examples of this API here and here. Thank Patrick Snape, one of the main developers of the menpo project, for this addition.

Also, I've added an implementation of the winning algorithm from last year's Visual Object Tracking Challenge. This was a method described in the paper:

Danelljan, Martin, et al. "Accurate scale estimation for robust visual tracking." Proceedings of the British Machine Vision Conference BMVC. 2014.

You can see some videos showing dlib's implementation of this new tracker in action on youtube:

All these videos were processed by exactly the same piece of software. No hand tweaking or any funny business. The only required input (other than the raw video) is a bounding box on the first frame and then the tracker automatically follows whatever is inside the box after that. The whole thing runs at over 150fps on my desktop. You can see an example program showing how to use it here, or just go download the new dlib instead :)

I've also finally posted the paper I've been writing on dlib's structural SVM based training algorithm, which is the algorithm behind the easy to use object detector.

90 comments :

Shervin Emami said...: Great addition, the object tracker seems quite robust in the video!; February 4, 2015 at 12:58 AM
Davis King said...: Thanks! :)

Yeah, for objects that don't undergo rapid out of plane rotations it works pretty well.; February 4, 2015 at 8:14 PM
Stefanelus said...: hey Davis,

I tried to test the tracker and when I execute the sample I get something like this:

Error detected in function void __thiscall dlib::matrix,struct dlib::row_major_layout>::set_size(
long,long).

Failing expression was (NR == 0 || NR == rows) && ( NC == 0 || NC == cols) && ro
ws >= 0 && cols >= 0.
void matrix::set_size(rows, cols)
You have supplied conflicting matrix dimensions
rows: 0
cols: 0
NR: 0
NC: 1
this: 008EFBF0

I used the frames provided in the library.

Best regards,
Stefan; February 15, 2015 at 6:42 AM
Davis King said...: Oops. I have an assert statement triggering in debug mode that I need to fix. However, if you run it release mode it will work fine.

Running in debug mode is very slow anyway (http://dlib.net/faq.html#Why%20is%20dlib%20slow?); February 15, 2015 at 8:09 AM
Stefanelus said...: it did the trick, from the posted video the tracker looks really cool.; February 15, 2015 at 10:26 AM
Stefanelus said...: Dear Davis,

I'm trying to classify some image patches(96 x 96) which are faces. I have a few images per subject, around 30 up to 50 patches.

I have played with the image recognition from OpenCV but is very sensitive to a lot of things.

My question is I can use a feature descriptor from dlib and then train a classifier. In the dlib I saw SURF and a few other feature descriptors.

What is your advices ? It will make sense to use some descriptors from dlib for image recognition ?

Best regards,
Stefan; March 2, 2015 at 9:06 AM
Davis King said...: Sure, you can try using extract_highdim_face_lbp_descriptors(), extract_fhog_features(), or extract_uniform_lbp_descriptors() with a linear SVM. Any of those features generally give reasonable results for this kind of thing.; March 2, 2015 at 6:03 PM
Stefanelus said...: many thanks, I'll gave it a try.; March 3, 2015 at 2:50 AM
Unknown said...: Dear Davis,

I noticed the tracker run over at 150fps on your desktop, but in my testing it just have about 50fps.
Can you help me figure out where the problem occured?

Best,
Max; March 4, 2015 at 12:00 AM
Davis King said...: Does this answer your question? http://dlib.net/faq.html#Whyisdlibslow; March 4, 2015 at 6:52 AM
Unknown said...: Hello Davis, this seems interesting and the tracker looks robust, but what happens for example when in the next frame the object isn't in screen anymore? Do you get an error message? Or there is a way I can initialize the tracker again on other object I have detected? Thanks in advance!; April 22, 2015 at 11:05 AM
Davis King said...: The correlation tracker doesn't deal with or detect any of those cases. To get an entire tracking system you must combine it with many other tools and how you do that depends on your application; April 22, 2015 at 5:09 PM
Unknown said...: Thanks for the response!, somehow I was able to reset the tracking of the pedestrian when out of my region of interest and worked pretty well. Now I have another doubt: Sometimes on semi-occluded pedestrians the bounding box jumps from the tracked one to another. Can I improve the performance tweaking something on the algorithm or I have to use another aproach for those cases?; April 23, 2015 at 10:33 PM
Davis King said...: The most common approach is to use something like this correlation tracker to generate short tracks (people call them tracklets). So you have to be able to identify when the tracker will fail so you can chop its output into tracklets. Then you use some additional processing to figure out which tracklets should associate together. There is a large literature on this. I would google for tracklet association and terms like that.

Personally, I would use http://dlib.net/ml.html#structural_assignment_trainer to perform tracklet association.; April 24, 2015 at 6:48 AM
Unknown said...: How could I make my own landmark detector xml file so that I could train the program to detect cars for example?; May 20, 2015 at 2:20 PM
Davis King said...: You can use the imglab program in the tools sub folder to label images.; May 20, 2015 at 9:14 PM
Unknown said...: Is there a python version to imglab?; May 20, 2015 at 10:04 PM
Davis King said...: Imglab is a graphical program. You don't need to look at its source code to use it so it doesn't matter what language it's written in.; May 21, 2015 at 7:51 AM
Unknown said...: When do you expect that this will be usable from Python?; May 24, 2015 at 4:13 PM
Davis King said...: Right now :)

see https://github.com/davisking/dlib/blob/master/python_examples/correlation_tracker.py; May 24, 2015 at 7:43 PM
Unknown said...: Thank you! Nice to be able to objection detection AND tracking from Python.; May 24, 2015 at 7:46 PM
Unknown said...: @Davis King, Could you make a .exe for the imglab program so that everyone could run it? If not, is there any similar program??; May 24, 2015 at 8:23 PM
Davis King said...: Why not compile it yourself? You just run cmake in the folder and it will shoot out the exe.; May 24, 2015 at 8:41 PM
Unknown said...: I have no experience with C++ and I've tried to compile it but I get errors. I also have a problem with correlation tracker.py I receive a no module error.; May 24, 2015 at 9:33 PM
Davis King said...: The README.txt file in tools/imglab tells you exactly what to type to compile it. It should have worked. What happened when you tried those commands?; May 24, 2015 at 9:51 PM
Unknown said...: I believe there were some import errors. How would I solve the tracker correlation problem.

AttributeError: 'module' object has no attribute 'correlation_tracker'; May 24, 2015 at 10:11 PM
Davis King said...: Did you compile the dlib library code from https://github.com/davisking/dlib?

The correlation tracker was only added to the python interface a few days ago so if you are trying to use an older version it won't work.; May 25, 2015 at 8:15 AM
Unknown said...: I recieve an error regard cl in cmake .. when compiling imglab.; May 25, 2015 at 12:31 PM
Unknown said...: My goal is to blur all heads in any police body camera video. My thought is to do head detection and then track forwards and backwards any detection with this real-time video object tracking script. Hopefully then it won't miss much. Any suggestions for making it efficient? Is there a better way than running the script per each detection?; May 25, 2015 at 1:43 PM
Davis King said...: I would just run the face detector on each frame and not worry about tracking.; May 25, 2015 at 6:01 PM
Unknown said...: How do I deal with half a head etc then?; May 25, 2015 at 6:18 PM
Davis King said...: The tracker might not work any better for partially occluded heads. You will just have to experiment and see what works.; May 25, 2015 at 6:37 PM
Unknown said...: Shucks. I was hoping to be able to keep people's heads blurred as they leave the frame. Thank you for the quick responses.; May 25, 2015 at 6:53 PM
Unknown said...: How to integrate the code with Live Video Stream....; May 26, 2015 at 2:55 PM
Unknown said...: I ran tracking on the guy on left side of frame 1 of https://www.youtube.com/watch?v=F0HkplIekOQ When the officer walks away for a few frames the guy in frame 1 is never tracked again. Is there anyway to track in a situation like this without redrawing the rectangles each time? Doing head detection hasn't been much successful ran into too many instances of "killed" so gave up on training.; May 30, 2015 at 8:13 PM
Unknown said...: Hi Davis,

Thanks to the wonderful algorithm, it looks attracting in the video.

I followed the instructions on the top of python version example and successfully compile the bat file. But still cannot import dlib and get the ERROR 'ImportError: No module named dlib'. I'm totally a beginner in either computer vision or python, could u please help me out of this problem? (BTW, I'm using mac os 10.10 and python 2.7)

Thanks~; June 1, 2015 at 5:04 AM
Davis King said...: Did you run the python example by typing

python correlation_tracker.py

From within the python_examples folder?; June 1, 2015 at 6:41 AM
Unknown said...: Thanks for the quick response. I solved the importing thing by moving dlib.so to the site-package file but another error arose: AttributeError: 'module' object has no attribute 'image_window', any suggestions?

I found some hints from GitHub says that Deleted examples build and recompiled, and now I can't recreate the error. But I'm kind of confused what should be deleted and what should be recompiled (maybe the compile_dlib_python_module.bat again) ?; June 1, 2015 at 9:34 PM
Unknown said...: Sorry I missed your commend at the bottom of the GitHub issue, and now I can perfectly use the dlib as well as the correlation tracking method. I tried the tracking algorithm in my testing video and the performance is awesome !!; June 2, 2015 at 2:08 AM
Unknown said...: hi,
im trying to build a camera which is able to detect people staying in one place for a long time... can I use dlib to detect multiple people from a libe video feed that is already subjected to background subtraction(backgroundsubtractorgmg) Thankyou...; June 13, 2015 at 5:02 AM
Davis King said...: Yes, there are many tools useful for that in dlib. I would try training a HOG filter to find the people. See http://blog.dlib.net/2014/02/dlib-186-released-make-your-own-object.html for example.; June 13, 2015 at 8:03 AM
Unknown said...: Thank you Mr.Davis for your advise.; July 4, 2015 at 12:41 AM
Unknown said...: Hello, i have a problem with speed with the correlation tracker. I am using linux and i compiled the library as written in the compilation instructions (with release mode on). Anyhow, with the provided test example with resolution 320x480, i only get about 25-30 fps, which is far from 150 fps.
Is this because i am using dlib with Python API?

Thank you for your answer!; July 17, 2015 at 8:14 PM
Davis King said...: No, the Python API isn't too much slower than the C++ API. Maybe your computer is super slow? Maybe you didn't really compile it in release mode. I don't know. I just tried it on my computer and I get 150fps in C++ including file I/O. In Python I'm getting 107fps but only because Python's image loading is much slower.; July 18, 2015 at 1:32 PM
Dashesy said...: Any chance you submit dlib for PyPI, it makes the impact orders of magnitude higher, thanks for great utility.; August 18, 2015 at 12:27 PM
Davis King said...: I'm not going to but you are welcome to do it if you are interested :); August 18, 2015 at 4:45 PM
Andrés Felipe said...: Hi Davis, I'm having problems when I try to compile the program in Eclipse.

make all
Building target: Tracker_dlib
Invoking: Cross G++ Linker
g++ -L/usr/lib/ -o "Tracker_dlib" ./src/Tracker_dlib.o -lpthread -lblas -llapack -ljpeg -lpng -lX11
/usr/bin/ld: cannot find -lblas
/usr/bin/ld: cannot find -llapack

However, when I check for libblas and liblabpack, these are the outputs that I get.

anfedres@anfedres-ThinkPad-W530:~/Documents/dlib-18.17/examples/build$ ldconfig -p | grep liblapack
liblapack.so.3 (libc6,x86-64) => /usr/lib/liblapack.so.3
anfedres@anfedres-ThinkPad-W530:~/Documents/dlib-18.17/examples/build$ ldconfig -p | grep libblas
libblas.so.3 (libc6,x86-64) => /usr/lib/libblas.so.3

I've added already /usr/lib/ to the library search path. Any clue which could be the problem?

Thank you.; August 19, 2015 at 5:59 PM
Davis King said...: There are instructions for compiling dlib here http://dlib.net/compile.html. CMake can also generate an eclipse project if you really want to use eclipse.; August 19, 2015 at 7:40 PM
Andrés Felipe said...: Solved, thank Davis.; August 19, 2015 at 7:44 PM
Andrés Felipe said...: Is there a way to make the tracker recover if it is lost?. Thanks Davis.; August 20, 2015 at 10:30 AM
Davis King said...: No, you will need to include it inside some larger tracking framework to deal with that sort of issue.; August 20, 2015 at 10:05 PM
Unknown said...: This comment has been removed by the author.; August 24, 2015 at 12:40 AM
Unknown said...: Hi! I have multiple (x,y) coordinates for multiple people in each frame. Is it possible to track multiple people using this tracker? I saw in one of your previous comments that you have suggested using HOG but I already have my coordinates and I just need to track those multiple targets.
I see that after we start tracking, all I can do is send my image to tracker.update().
Is it possible to see what tracker.update() is actually doing?; August 24, 2015 at 12:48 AM
Davis King said...: Create multiple instances of the tracker, one for each object to track.; August 24, 2015 at 6:30 AM
mces89 said...: Hi, I want to do some visual tracking of objects in the video, I'm a little bit confused why this kind of tracking does not need object detection? What's the difference for the following two ways?

1. object detection for each frame, and then do some post-processing
2. object detection for the first frame, then do what was done in dlib.

Thanks.; August 25, 2015 at 11:57 AM
Chicharito said...: Hi,

Im trying to use dlib with Qt framework (http://www.qt.io/). How to push object QImage or QVideoFrame into correlation_tracker object?

Thanks!; September 4, 2015 at 4:11 AM
Unknown said...: Nice video,I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
Regards,
Python Training in Chennai|Python Taining; September 12, 2015 at 5:41 AM
Unknown said...: Hi Davis, Thanks for the nice video. May I ask if I/we can get the unboxed videos? So that I can test it.

Kind regards,
Tian; October 27, 2015 at 11:42 AM
Unknown said...: Hi Davis,

First of all, thanks for your work! I would like to ask you a question. I am using the vot-toolkit tool to evaluate your correlation tracker (DSST) in order to compare your results with the ones obtained with the original DSST (Matlab). I get worse results with the VOT 2014 data set with your tracker than the original DSST. Why does this happen?

Thanks again! Have a nice day.; November 17, 2015 at 6:40 AM
Davis King said...: I also ran dlib's version on the vot-toolkit and didn't get results as good as what was reported in the DSST paper. I'm not sure why that is but my guess is that there are additional things they did beyond what is reported in the paper. Maybe there is some reacquisition logic? I'm not sure.; November 17, 2015 at 8:08 AM
Unknown said...: Hi Davis,

Yeah! I was examining the DLIB code (which I think It's really good) and compare It with the Matlab original version (https://github.com/gnebehay/DSST). And the most selectable difference that I encounter was that the DLIB version always uses square filters (64 x 64) but the Matlab version adapts the filter size to the patch size. Do you think that could be some important point?

Thanks!; November 17, 2015 at 11:55 AM
Davis King said...: I tried it both ways when I was implementing it and there didn't seem to be any significant difference in accuracy between square vs. non-square filter shapes. It was a little bit faster and simpler to use square filters so I did it that way. Although you never know, maybe that's part of the difference.; November 17, 2015 at 9:51 PM
Anguo Yang said...: Hi, davis, can this video object detector be used as a head counter?
as this this video:
https://www.youtube.com/watch?v=OWab2_ete7s

head area(camera above) could also be seen as one "type of" visual object.; December 2, 2015 at 2:33 AM
eyebies said...: This comment has been removed by the author.; December 2, 2015 at 2:35 AM
Unknown said...: Hi Davis,
are you open for custom job?I need to create an app which can track object. mike.sorochev@gmail.com; March 23, 2016 at 11:52 AM
Drew Sun said...: Hello Davis.

First off, Kudos for the good work.

We are noticing that the ConfidenceLevel of tracker is high even when the subject has moved out of the video frame. Can you please confirm the range of values for the tracker confidence level we should be looking at. Should we be looking at some other parameter for continued tracking.

Thanks.; May 13, 2016 at 6:41 PM
Davis King said...: The confidence value is only loosely correlated with track breaking. To get a good estimate of track failure you need to include additional machenry, what exactly depends on your application.; May 13, 2016 at 6:46 PM
Drew Sun said...: In our tests, We are tracking faces of pedestrians. They walk in front of camera and move out of view. Can you please clarify what you mean by additional machenry.

Thanks for the quick response.; May 13, 2016 at 10:58 PM
Davis King said...: I would run a face detector every few frames to make sure the objects are still present.; May 13, 2016 at 11:02 PM
Drew Sun said...: We are running face detector every few frames.

Wondering how this would behave when one or more subjects tracked leave the scene and few others enter the field of view of camera Not sure if running just a face detector suffices in this case.

Can you please confirm if ConfidenceLevel is good indicator of effective tracking when the subject just moves within the frame. Should we be looking at some other indicators.

Would a delimiter defining the perimeter of the frame for effective tracking help? i.e., dont track beyond a predefined boundary or something similar.; May 13, 2016 at 11:57 PM
Davis King said...: Those are good ideas and you will need to test them out when you develop your system to see what works. That is the only way to know.; May 14, 2016 at 6:25 AM
dapper dan said...: Great work. Can I modify the tracking parameters in python or do I need to recompile dlib for each change?; July 12, 2016 at 1:04 PM
Davis King said...: You have to use C++ to do that.; July 12, 2016 at 1:29 PM
Unknown said...: Really cool!; July 30, 2016 at 7:03 AM
Unknown said...: Hi Davis

Thanks for creating Dlib. I found it really useful.

I have a couple questions about correation_trakcer:

1) How can I obtain relevant values of tracking-rectangle (e.g., center position, width, height, etc.)? It seems that I need to do something with get_position() but I cannot get the value I want.

2) When an tracked-object moves fast, the tracker can lose the object. Is there any way to minimize this possibility? (I know that I need to detect the object again in case it is lost.)

Thanks in advance.; January 30, 2017 at 10:23 PM
Celebnews Corner said...: This comment has been removed by the author.; May 9, 2017 at 1:57 PM
Celebnews Corner said...: hi davis, i tried to use dlib to place face andmarks but it gives out this error henever i try to use shape_pridictor:

error : predictor = dlib.shape_predictor(predictor_path)
RuntimeError: Error deserializing a floating point number.
while deserializing a dlib::matrix
while deserializing object of type std::vector
while deserializing object of type std::vector
while deserializing object of type std::vector

i am using latest version on my raspberry pi.
i need to get this done because i have a presentation for my final year project please i need help !; May 9, 2017 at 2:01 PM
Unknown said...: This comment has been removed by the author.; August 10, 2017 at 3:19 PM
Unknown said...: This comment has been removed by the author.; August 10, 2017 at 3:30 PM
Unknown said...: That is a good tracker. However, it is unfortunately not possible to track multiple objects.

In my case there is many tiny objects which I do not have any problems detecting them. However, when it comes to tracking. I have no clues how to do so.

how exactly one is able to create multiple tracker.update() while having not much speed performance degradation?
I mean how is it possible to use parallel processing in GPU track each object individually using parallel processing.
I have tried to reproduce such a system: https://www.youtube.com/watch?v=3IR1h6i31JI
so far detection is good, but tracking fails miserably.; August 11, 2017 at 5:07 AM
YZ R said...: Hi Davis,

I am particular interested in the part as shown in the video from 3:32 - 3:44, where two person cross and intersect.

I have implement the tracker where I track faces, the tracker works very well with one face in the screen. When I tried to track two face where they crosses, when the tracked face is in the front it is still working good. However when the tracked face is in the back, the front face will 'bring' the tracker away and the tracked object will now become the face in the back.

I am running the tracker at around a 30fps video (real time from webcam). Unlike shown in the video where the tracker will still recognize the tracking object even when two person cross and intersects, regardless the tracking object is in the back or the front.

Is there any additional algorithm applied in order to achieve the performance as shown in the video? As my understanding is that the tracker algorithm will look at the closest pixel in the bounding box of current and subsequence frame, hence what I observed from my implementation should be correct.

Thank you.

Regards
YZ; February 26, 2018 at 4:09 AM
Davis King said...: The video doesn't use any additional processing tricks. But in general this kind of algorithm will often, but not always, get confused if two similar looking objects briefly occlude each other. To make it more robust to this kind of thing you need to add some stronger appearance based features like pull out a face descriptor and use that to deal with track swaps. There is also an extended version of this algorithm that is better at disambiguating this kind of issue (http://openaccess.thecvf.com/content_cvpr_2017/papers/Mueller_Context-Aware_Correlation_Filter_CVPR_2017_paper.pdf) which was presented at last year's CVPR. I haven't added it to dlib yet though.; February 26, 2018 at 6:54 AM
Unknown said...: Hi David, is possible to use a GPU with this algorithm ? for one up to 5 objects it's work fine. But add one more and all is going very slow.
Best regards.
Martín.; January 10, 2019 at 3:02 PM
Unknown said...: Hi David, is possible to use a GPU with this algorithm ? for one up to 5 objects it's work fine. But add one more and all is going very slow.
Best regards.
Martín.; January 10, 2019 at 3:05 PM
Davis King said...: There isn't any GPU accelerated version of this.; January 10, 2019 at 7:18 PM
Jay said...: Hi Davis,
I am currently using Correlation tracker to track speed limit signs in the videos. The tracker works fine, however when the speed sign goes out of the image the tracker returns negative x and y values. I used

tracker.update(current_image)
cout<<tracker.get_position().left()<< " "<<tracker.get_position().top()<<" "<<tracker.get_position().right()<<" "<<tracker.get_position().bottom()<<endl;

I tested the tracker with three videos and I observe this behaviour whenever the speed sign goes out of the image. Please suggest whether this is an expected behaviour or not. Thanks.; March 24, 2019 at 10:57 AM
Davis King said...: I assume it's going out of the image to the left or top? Those areas have negative coordinates, so this is expected.; March 24, 2019 at 6:00 PM
Jay said...: Many thanks for confirming that negative coordinates are expected. As I drive, the speed signs in the captured video goes out of the image to the top and left. I now handled this exception in my code.; March 26, 2019 at 10:48 AM
de said...: The article provides an impressive overview of advancements in dlib, particularly the introduction of a comprehensive Python API for object pose estimation, machine learning, and visual tracking. The ability to perform robust real-time object tracking with minimal initialization demonstrates the practical value of combining efficient algorithms with accessible development tools. The examples and performance claims highlight how modern computer vision systems can achieve accurate tracking even in challenging video environments.; June 4, 2026 at 1:04 AM
for. said...: Object detection plays a vital role in applications such as autonomous vehicles, surveillance, robotics, and smart traffic management. Students exploring Object Detection Projects can build innovative AI solutions that accurately detect and track objects in images and videos using modern deep learning techniques.; July 19, 2026 at 1:04 AM