Abstract:
Visual tracking or object tracking is the process of estimating the state of the target in successive frames of a video sequence. It is an integral part of a plethora of applications like security, surveillance, navigation systems, traffic monitoring systems, human computer interaction systems, and robotics where the target is tracked in both stationary as well as dynamic environments. Visual tracking has remained a challenging problem in computer vision because of numerous factors like occlusion, illumination variation, background clutter, pose change, scale variation, deformation, etc. To overcome these challenges we propose an online analysis dictionary learning framework for visual tracking. Dictionary learning is a popular representation learning tool today. It has been successfully applied to a wide range of computer vision tasks like image denoising, image super resolution, face recognition, human action recognition, classification, etc. in the recent years. Synthesis dictionary based learning approaches have been applied to visual tracking as well. However, to the best of our knowledge, the use of analysis dictionary based algorithms for visual tracking has not been done yet. The main advantage of an analysis dictionary over a synthesis dictionary is, for an analysis and a synthesis dictionary of same dimensions, an analysis dictionary is able to capture significantly more variability in the data compared to a synthesis dictionary.
We have developed our algorithm in two stages. In this first stage we track the targets in video sequences using a single analysis dictionary. In the next stage we develop a multiple analysis dictionaries model to track the object of interest. After extensive experimentation on video sequences from OTB-50 dataset, we have demonstrated that our algorithm works better than the synthesis dictionary learning based trackers and also some of the other state of the art trackers that do not incorporate dictionary learning in their tracking approach.