Tuesday, February 3, 2015

Python Stuff and Real-Time Video Object Tracking

The new version of dlib is out today. As promised, there is now a full Python API for using dlib's state-of-the-art object pose estimation and learning tools.  You can see examples of this API here and here.  Thank Patrick Snape, one of the main developers of the menpo project, for this addition.

Also, I've added an implementation of the winning algorithm from last year's Visual Object Tracking Challenge.  This was a method described in the paper:
Danelljan, Martin, et al. "Accurate scale estimation for robust visual tracking." Proceedings of the British Machine Vision Conference BMVC. 2014.
You can see some videos showing dlib's implementation of this new tracker in action on youtube:


All these videos were processed by exactly the same piece of software.  No hand tweaking or any funny business.  The only required input (other than the raw video) is a bounding box on the first frame and then the tracker automatically follows whatever is inside the box after that.  The whole thing runs at over 150fps on my desktop.  You can see an example program showing how to use it here, or just go download the new dlib instead :)

I've also finally posted the paper I've been writing on dlib's structural SVM based training algorithm, which is the algorithm behind the easy to use object detector.

88 comments :

  1. Great addition, the object tracker seems quite robust in the video!

    ReplyDelete
  2. Thanks! :)

    Yeah, for objects that don't undergo rapid out of plane rotations it works pretty well.

    ReplyDelete
  3. hey Davis,

    I tried to test the tracker and when I execute the sample I get something like this:

    Error detected in function void __thiscall dlib::matrix,struct dlib::row_major_layout>::set_size(
    long,long).

    Failing expression was (NR == 0 || NR == rows) && ( NC == 0 || NC == cols) && ro
    ws >= 0 && cols >= 0.
    void matrix::set_size(rows, cols)
    You have supplied conflicting matrix dimensions
    rows: 0
    cols: 0
    NR: 0
    NC: 1
    this: 008EFBF0

    I used the frames provided in the library.

    Best regards,
    Stefan

    ReplyDelete
  4. Oops. I have an assert statement triggering in debug mode that I need to fix. However, if you run it release mode it will work fine.

    Running in debug mode is very slow anyway (http://dlib.net/faq.html#Why%20is%20dlib%20slow?)

    ReplyDelete
  5. it did the trick, from the posted video the tracker looks really cool.

    ReplyDelete
  6. Dear Davis,

    I'm trying to classify some image patches(96 x 96) which are faces. I have a few images per subject, around 30 up to 50 patches.

    I have played with the image recognition from OpenCV but is very sensitive to a lot of things.

    My question is I can use a feature descriptor from dlib and then train a classifier. In the dlib I saw SURF and a few other feature descriptors.

    What is your advices ? It will make sense to use some descriptors from dlib for image recognition ?

    Best regards,
    Stefan

    ReplyDelete
  7. Sure, you can try using extract_highdim_face_lbp_descriptors(), extract_fhog_features(), or extract_uniform_lbp_descriptors() with a linear SVM. Any of those features generally give reasonable results for this kind of thing.

    ReplyDelete
  8. many thanks, I'll gave it a try.

    ReplyDelete
  9. Dear Davis,

    I noticed the tracker run over at 150fps on your desktop, but in my testing it just have about 50fps.
    Can you help me figure out where the problem occured?

    Best,
    Max

    ReplyDelete
  10. Does this answer your question? http://dlib.net/faq.html#Whyisdlibslow

    ReplyDelete
  11. Hello Davis, this seems interesting and the tracker looks robust, but what happens for example when in the next frame the object isn't in screen anymore? Do you get an error message? Or there is a way I can initialize the tracker again on other object I have detected? Thanks in advance!

    ReplyDelete
  12. The correlation tracker doesn't deal with or detect any of those cases. To get an entire tracking system you must combine it with many other tools and how you do that depends on your application

    ReplyDelete
  13. Thanks for the response!, somehow I was able to reset the tracking of the pedestrian when out of my region of interest and worked pretty well. Now I have another doubt: Sometimes on semi-occluded pedestrians the bounding box jumps from the tracked one to another. Can I improve the performance tweaking something on the algorithm or I have to use another aproach for those cases?

    ReplyDelete
  14. The most common approach is to use something like this correlation tracker to generate short tracks (people call them tracklets). So you have to be able to identify when the tracker will fail so you can chop its output into tracklets. Then you use some additional processing to figure out which tracklets should associate together. There is a large literature on this. I would google for tracklet association and terms like that.

    Personally, I would use http://dlib.net/ml.html#structural_assignment_trainer to perform tracklet association.

    ReplyDelete
  15. How could I make my own landmark detector xml file so that I could train the program to detect cars for example?

    ReplyDelete
  16. You can use the imglab program in the tools sub folder to label images.

    ReplyDelete
  17. Is there a python version to imglab?

    ReplyDelete
  18. Imglab is a graphical program. You don't need to look at its source code to use it so it doesn't matter what language it's written in.

    ReplyDelete
  19. When do you expect that this will be usable from Python?

    ReplyDelete
  20. Right now :)

    see https://github.com/davisking/dlib/blob/master/python_examples/correlation_tracker.py

    ReplyDelete
  21. Thank you! Nice to be able to objection detection AND tracking from Python.

    ReplyDelete
  22. @Davis King, Could you make a .exe for the imglab program so that everyone could run it? If not, is there any similar program??

    ReplyDelete
  23. Why not compile it yourself? You just run cmake in the folder and it will shoot out the exe.

    ReplyDelete
  24. I have no experience with C++ and I've tried to compile it but I get errors. I also have a problem with correlation tracker.py I receive a no module error.

    ReplyDelete
  25. The README.txt file in tools/imglab tells you exactly what to type to compile it. It should have worked. What happened when you tried those commands?

    ReplyDelete
  26. I believe there were some import errors. How would I solve the tracker correlation problem.

    AttributeError: 'module' object has no attribute 'correlation_tracker'

    ReplyDelete
  27. Did you compile the dlib library code from https://github.com/davisking/dlib?

    The correlation tracker was only added to the python interface a few days ago so if you are trying to use an older version it won't work.

    ReplyDelete
  28. I recieve an error regard cl in cmake .. when compiling imglab.

    ReplyDelete
  29. My goal is to blur all heads in any police body camera video. My thought is to do head detection and then track forwards and backwards any detection with this real-time video object tracking script. Hopefully then it won't miss much. Any suggestions for making it efficient? Is there a better way than running the script per each detection?

    ReplyDelete
  30. I would just run the face detector on each frame and not worry about tracking.

    ReplyDelete
  31. How do I deal with half a head etc then?

    ReplyDelete
  32. The tracker might not work any better for partially occluded heads. You will just have to experiment and see what works.

    ReplyDelete
  33. Shucks. I was hoping to be able to keep people's heads blurred as they leave the frame. Thank you for the quick responses.

    ReplyDelete
  34. How to integrate the code with Live Video Stream....

    ReplyDelete
  35. I ran tracking on the guy on left side of frame 1 of https://www.youtube.com/watch?v=F0HkplIekOQ When the officer walks away for a few frames the guy in frame 1 is never tracked again. Is there anyway to track in a situation like this without redrawing the rectangles each time? Doing head detection hasn't been much successful ran into too many instances of "killed" so gave up on training.

    ReplyDelete
  36. Hi Davis,

    Thanks to the wonderful algorithm, it looks attracting in the video.

    I followed the instructions on the top of python version example and successfully compile the bat file. But still cannot import dlib and get the ERROR 'ImportError: No module named dlib'. I'm totally a beginner in either computer vision or python, could u please help me out of this problem? (BTW, I'm using mac os 10.10 and python 2.7)

    Thanks~

    ReplyDelete
  37. Did you run the python example by typing

    python correlation_tracker.py

    From within the python_examples folder?

    ReplyDelete
  38. Thanks for the quick response. I solved the importing thing by moving dlib.so to the site-package file but another error arose: AttributeError: 'module' object has no attribute 'image_window', any suggestions?

    I found some hints from GitHub says that Deleted examples build and recompiled, and now I can't recreate the error. But I'm kind of confused what should be deleted and what should be recompiled (maybe the compile_dlib_python_module.bat again) ?

    ReplyDelete
  39. Sorry I missed your commend at the bottom of the GitHub issue, and now I can perfectly use the dlib as well as the correlation tracking method. I tried the tracking algorithm in my testing video and the performance is awesome !!

    ReplyDelete
  40. hi,
    im trying to build a camera which is able to detect people staying in one place for a long time... can I use dlib to detect multiple people from a libe video feed that is already subjected to background subtraction(backgroundsubtractorgmg) Thankyou...

    ReplyDelete
  41. Yes, there are many tools useful for that in dlib. I would try training a HOG filter to find the people. See http://blog.dlib.net/2014/02/dlib-186-released-make-your-own-object.html for example.

    ReplyDelete
  42. Thank you Mr.Davis for your advise.

    ReplyDelete
  43. Hello, i have a problem with speed with the correlation tracker. I am using linux and i compiled the library as written in the compilation instructions (with release mode on). Anyhow, with the provided test example with resolution 320x480, i only get about 25-30 fps, which is far from 150 fps.
    Is this because i am using dlib with Python API?

    Thank you for your answer!

    ReplyDelete
  44. No, the Python API isn't too much slower than the C++ API. Maybe your computer is super slow? Maybe you didn't really compile it in release mode. I don't know. I just tried it on my computer and I get 150fps in C++ including file I/O. In Python I'm getting 107fps but only because Python's image loading is much slower.

    ReplyDelete
  45. Any chance you submit dlib for PyPI, it makes the impact orders of magnitude higher, thanks for great utility.

    ReplyDelete
  46. I'm not going to but you are welcome to do it if you are interested :)

    ReplyDelete
  47. Hi Davis, I'm having problems when I try to compile the program in Eclipse.

    make all
    Building target: Tracker_dlib
    Invoking: Cross G++ Linker
    g++ -L/usr/lib/ -o "Tracker_dlib" ./src/Tracker_dlib.o -lpthread -lblas -llapack -ljpeg -lpng -lX11
    /usr/bin/ld: cannot find -lblas
    /usr/bin/ld: cannot find -llapack

    However, when I check for libblas and liblabpack, these are the outputs that I get.


    anfedres@anfedres-ThinkPad-W530:~/Documents/dlib-18.17/examples/build$ ldconfig -p | grep liblapack
    liblapack.so.3 (libc6,x86-64) => /usr/lib/liblapack.so.3
    anfedres@anfedres-ThinkPad-W530:~/Documents/dlib-18.17/examples/build$ ldconfig -p | grep libblas
    libblas.so.3 (libc6,x86-64) => /usr/lib/libblas.so.3

    I've added already /usr/lib/ to the library search path. Any clue which could be the problem?

    Thank you.



    ReplyDelete
  48. There are instructions for compiling dlib here http://dlib.net/compile.html. CMake can also generate an eclipse project if you really want to use eclipse.

    ReplyDelete
  49. Is there a way to make the tracker recover if it is lost?. Thanks Davis.

    ReplyDelete
  50. No, you will need to include it inside some larger tracking framework to deal with that sort of issue.

    ReplyDelete
  51. This comment has been removed by the author.

    ReplyDelete
  52. Hi! I have multiple (x,y) coordinates for multiple people in each frame. Is it possible to track multiple people using this tracker? I saw in one of your previous comments that you have suggested using HOG but I already have my coordinates and I just need to track those multiple targets.
    I see that after we start tracking, all I can do is send my image to tracker.update().
    Is it possible to see what tracker.update() is actually doing?

    ReplyDelete
  53. Create multiple instances of the tracker, one for each object to track.

    ReplyDelete
  54. Hi, I want to do some visual tracking of objects in the video, I'm a little bit confused why this kind of tracking does not need object detection? What's the difference for the following two ways?

    1. object detection for each frame, and then do some post-processing
    2. object detection for the first frame, then do what was done in dlib.

    Thanks.

    ReplyDelete
  55. Hi,

    Im trying to use dlib with Qt framework (http://www.qt.io/). How to push object QImage or QVideoFrame into correlation_tracker object?

    Thanks!

    ReplyDelete
  56. Nice video,I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
    Regards,
    Python Training in Chennai|Python Taining

    ReplyDelete
  57. Hi Davis, Thanks for the nice video. May I ask if I/we can get the unboxed videos? So that I can test it.

    Kind regards,
    Tian

    ReplyDelete
  58. Hi Davis,

    First of all, thanks for your work! I would like to ask you a question. I am using the vot-toolkit tool to evaluate your correlation tracker (DSST) in order to compare your results with the ones obtained with the original DSST (Matlab). I get worse results with the VOT 2014 data set with your tracker than the original DSST. Why does this happen?

    Thanks again! Have a nice day.

    ReplyDelete
  59. I also ran dlib's version on the vot-toolkit and didn't get results as good as what was reported in the DSST paper. I'm not sure why that is but my guess is that there are additional things they did beyond what is reported in the paper. Maybe there is some reacquisition logic? I'm not sure.

    ReplyDelete
  60. Hi Davis,

    Yeah! I was examining the DLIB code (which I think It's really good) and compare It with the Matlab original version (https://github.com/gnebehay/DSST). And the most selectable difference that I encounter was that the DLIB version always uses square filters (64 x 64) but the Matlab version adapts the filter size to the patch size. Do you think that could be some important point?

    Thanks!

    ReplyDelete
  61. I tried it both ways when I was implementing it and there didn't seem to be any significant difference in accuracy between square vs. non-square filter shapes. It was a little bit faster and simpler to use square filters so I did it that way. Although you never know, maybe that's part of the difference.

    ReplyDelete
  62. Hi, davis, can this video object detector be used as a head counter?
    as this this video:
    https://www.youtube.com/watch?v=OWab2_ete7s

    head area(camera above) could also be seen as one "type of" visual object.

    ReplyDelete
  63. This comment has been removed by the author.

    ReplyDelete
  64. Hi Davis,
    are you open for custom job?I need to create an app which can track object. mike.sorochev@gmail.com

    ReplyDelete
  65. Hello Davis.

    First off, Kudos for the good work.

    We are noticing that the ConfidenceLevel of tracker is high even when the subject has moved out of the video frame. Can you please confirm the range of values for the tracker confidence level we should be looking at. Should we be looking at some other parameter for continued tracking.

    Thanks.

    ReplyDelete
  66. The confidence value is only loosely correlated with track breaking. To get a good estimate of track failure you need to include additional machenry, what exactly depends on your application.

    ReplyDelete
  67. In our tests, We are tracking faces of pedestrians. They walk in front of camera and move out of view. Can you please clarify what you mean by additional machenry.

    Thanks for the quick response.

    ReplyDelete
  68. I would run a face detector every few frames to make sure the objects are still present.

    ReplyDelete
  69. We are running face detector every few frames.


    Wondering how this would behave when one or more subjects tracked leave the scene and few others enter the field of view of camera Not sure if running just a face detector suffices in this case.

    Can you please confirm if ConfidenceLevel is good indicator of effective tracking when the subject just moves within the frame. Should we be looking at some other indicators.

    Would a delimiter defining the perimeter of the frame for effective tracking help? i.e., dont track beyond a predefined boundary or something similar.


    ReplyDelete
  70. Those are good ideas and you will need to test them out when you develop your system to see what works. That is the only way to know.

    ReplyDelete
  71. Great work. Can I modify the tracking parameters in python or do I need to recompile dlib for each change?

    ReplyDelete
  72. Hi Davis

    Thanks for creating Dlib. I found it really useful.

    I have a couple questions about correation_trakcer:

    1) How can I obtain relevant values of tracking-rectangle (e.g., center position, width, height, etc.)? It seems that I need to do something with get_position() but I cannot get the value I want.

    2) When an tracked-object moves fast, the tracker can lose the object. Is there any way to minimize this possibility? (I know that I need to detect the object again in case it is lost.)

    Thanks in advance.

    ReplyDelete
  73. This comment has been removed by the author.

    ReplyDelete
  74. hi davis, i tried to use dlib to place face andmarks but it gives out this error henever i try to use shape_pridictor:

    error : predictor = dlib.shape_predictor(predictor_path)
    RuntimeError: Error deserializing a floating point number.
    while deserializing a dlib::matrix
    while deserializing object of type std::vector
    while deserializing object of type std::vector
    while deserializing object of type std::vector

    i am using latest version on my raspberry pi.
    i need to get this done because i have a presentation for my final year project please i need help !

    ReplyDelete
  75. This comment has been removed by the author.

    ReplyDelete
  76. This comment has been removed by the author.

    ReplyDelete
  77. That is a good tracker. However, it is unfortunately not possible to track multiple objects.

    In my case there is many tiny objects which I do not have any problems detecting them. However, when it comes to tracking. I have no clues how to do so.

    how exactly one is able to create multiple tracker.update() while having not much speed performance degradation?
    I mean how is it possible to use parallel processing in GPU track each object individually using parallel processing.
    I have tried to reproduce such a system: https://www.youtube.com/watch?v=3IR1h6i31JI
    so far detection is good, but tracking fails miserably.

    ReplyDelete
  78. Hi Davis,

    I am particular interested in the part as shown in the video from 3:32 - 3:44, where two person cross and intersect.

    I have implement the tracker where I track faces, the tracker works very well with one face in the screen. When I tried to track two face where they crosses, when the tracked face is in the front it is still working good. However when the tracked face is in the back, the front face will 'bring' the tracker away and the tracked object will now become the face in the back.

    I am running the tracker at around a 30fps video (real time from webcam). Unlike shown in the video where the tracker will still recognize the tracking object even when two person cross and intersects, regardless the tracking object is in the back or the front.

    Is there any additional algorithm applied in order to achieve the performance as shown in the video? As my understanding is that the tracker algorithm will look at the closest pixel in the bounding box of current and subsequence frame, hence what I observed from my implementation should be correct.

    Thank you.

    Regards
    YZ

    ReplyDelete
  79. The video doesn't use any additional processing tricks. But in general this kind of algorithm will often, but not always, get confused if two similar looking objects briefly occlude each other. To make it more robust to this kind of thing you need to add some stronger appearance based features like pull out a face descriptor and use that to deal with track swaps. There is also an extended version of this algorithm that is better at disambiguating this kind of issue (http://openaccess.thecvf.com/content_cvpr_2017/papers/Mueller_Context-Aware_Correlation_Filter_CVPR_2017_paper.pdf) which was presented at last year's CVPR. I haven't added it to dlib yet though.

    ReplyDelete
  80. Hi David, is possible to use a GPU with this algorithm ? for one up to 5 objects it's work fine. But add one more and all is going very slow.
    Best regards.
    Martín.

    ReplyDelete
  81. Hi David, is possible to use a GPU with this algorithm ? for one up to 5 objects it's work fine. But add one more and all is going very slow.
    Best regards.
    Martín.

    ReplyDelete
  82. There isn't any GPU accelerated version of this.

    ReplyDelete
  83. Hi Davis,
    I am currently using Correlation tracker to track speed limit signs in the videos. The tracker works fine, however when the speed sign goes out of the image the tracker returns negative x and y values. I used

    tracker.update(current_image)
    cout<<tracker.get_position().left()<< " "<<tracker.get_position().top()<<" "<<tracker.get_position().right()<<" "<<tracker.get_position().bottom()<<endl;

    I tested the tracker with three videos and I observe this behaviour whenever the speed sign goes out of the image. Please suggest whether this is an expected behaviour or not. Thanks.

    ReplyDelete
  84. I assume it's going out of the image to the left or top? Those areas have negative coordinates, so this is expected.

    ReplyDelete
  85. Many thanks for confirming that negative coordinates are expected. As I drive, the speed signs in the captured video goes out of the image to the top and left. I now handled this exception in my code.

    ReplyDelete