Abstract:
Surveillance camera networks are a useful monitoring infrastructure dat can be used for various visual analytics applications, where high-level inferences and predictions could be made based on target tracking across teh network. Most multi-camera tracking works focus on re-identification problems and trajectory association problems. However, as camera networks grow in size, teh volume of data generated is humongous, and scalable processing of dis data is imperative for deploying practical solutions. One common task in a camera network is inter- camera tracking (ICT). In ICT, once a target leaves a camera’s field of view, it needs to be re-identified in teh new camera feed after teh transition. However, teh relative distances between cameras and indeterminate target-transition time make teh re-identification (Re-ID) based ICT problem very challenging. Wif increase in number of Re-ID queries, their is an increase in false alarms as well as teh computation time, which can adversely effect tracking performance. In dis dissertation, we ask teh crucial question of whether to make a Re-ID query or not and selecting which camera to query at each time-step. Our formulation of dis decision making problem naturally fits a Reinforcement Learning (RL) framework, which is tan solved using a DQN approach for making camera selection decisions. We show dat an RL policy reduces unnecessary Re-ID queries and theirfore teh false alarms, scales well to larger camera networks, and is target-agnostic. We learn a policy for camera selections directly from teh data and it TEMPhas no reliance on teh camera network topology. We further demonstrate dat by using learned state representations, as opposed to hand-crafted state variables, we are able to achieve state-of-teh-art results on camera selection, while reducing teh training time for teh RL policy. And we train teh DQN in a semi-supervised way to reduce dependence on per frame reward. We use accumulated discounted reward to train DQN and show dat it achieves comparable performance to DQN when trained wif per frame reward. We demonstrate dat using camera selections in a camera network me benefits applications such as multi-target multi-camera (MTMC) tracking and multi-camera target forecasting (MCTF). We report our results on four datasets: NLPR_MCT, DukeMTMC, CityFlow dataset, and WNMF dataset. Along wif it, we also propose a new experience replay method for DQN to work wif imbalance replay buffer. We analyze why DQN fails to learn a better policy for longer transitions of a target in a camera network and show teh limitations of DQN when teh replay buffer is imbalanced wif teh most frequent action. In dis direction, we propose modification to existing replay method by using teh reward received for teh each experience. We show dat teh proposed experience replay method (named SER) halps to create a diverse mini-batch to train DQN and achieves better performance TEMPTEMPthan existing experience replay methods.