dc.description.abstract |
Correlation Filter based visual trackers have demonstrated tremendous progress in object tracking. These trackers primarily use hierarchical features learned from multiple layers of a deep network. However, issues related to background awareness, deterministic aggregation of these features from various layers, difficulties in estimating variations in scale or rotation of the object being tracked, as well as challenges in effectively modelling the object’s appearance over long time periods leaves substantial scope to improve performance. Such issues lead to poor discriminative power and rapidly drift the tracker away from the target.
We propose ensemble and regularization techniques to achieve a strong discriminative ability for object trackers. We first obtain an ensemble of weak correlation filters using an adaptive weighing strategy and an appearance model pool to adapt to large appearance changes. Further, the target scale and rotation parameters are estimated using a dedicated affine correlation filter.
Our experiments reveal that each feature channel encodes a different appearance cue of the target and not all channels are equally important during different tracking steps. To this end, we propose a graph based adaptive channel weighing strategy that assigns weight to each channel based on the similarity of encoded appearance information. In order to model the channel importance and improve background awareness simultaneously, we introduce a sparse spatio-temporal regularizer with adaptive channel weights. We further kernelize the tracker to attain real-time performance. We extensively evaluate the trackers using publicly available datasets—Object Tracking Benchmark (OTB100), Visual Object Tracking (VOT) Benchmark 2016, VOT Benchmark 2017, Tracking Dataset, VOT Benchmark 2019, Temple Color 128, UAV123, LaSOT, and GOT10k. |
en_US |