Abstract:
Face recognition under controlled and constrained scenarios have reached a significant level of maturity with respect to performance and reliability. However, under unconstrained and un controlled settings, current state-of-the-art face recognition systems fail to yield a consistent level of performance. In recent years, several countries have experienced a high number of terrorist attacks, events of public unrest and cross border intrusions. As a preventive and inves tigative measure, governments around the world have installed surveillance cameras in public places such as railway and bus stations, airports, shopping malls, and so on. Images acquired from these cameras (probes) are captured in an unconstrained and non-cooperative environment, hence their quality in terms of resolution, illumination, pose, spectrum and so on may vary heav ily. Images captured by these cameras are matched with a background database which contain images collected from government records such as passport, driving licenses and so on. Such images (gallery) have much better and consistent quality. The matching of poor quality probes with good quality gallery images is a challenging problem, which involves utilizing auxiliary information (such as depth maps), improving the quality of the captured images, learning of heterogeneity aware models and matching to optimize the top-k identification accuracy. This dissertation attempts to develop effective algorithms for face recognition in unconstrained and non-cooperative scenarios where images captured are either in low resolution and/or in NIR (Near-Infrared) with low quality and inherent noise due to the in-the-wild image capture setup commonly encountered in surveillance settings. The first contribution is primarily aimed at utilizing auxiliary sources of information for train ing a shared representation for face recognition in unconstrained environments. Low cost depth sensors have opened new avenues for their usage in video surveillance scenarios. The depth information has been utilized in most RGB-D face recognition methods by fusing it with RGB information which results in enhanced recognition performance. However, in real world surveil lance scenarios, cameras are placed at a distance too large for low cost depth sensors to capture good quality depth information. Such poor quality depth information may not contribute sig nificantly to face recognition. The first contribution is on learning a shared representation of RGB and depth information using a reconstruction based deep neural network. The proposed network, once trained in offline mode, can generate the shared representation of RGB and depth
using only the RGB image. This feature rich representation is then utilized for face identifica tion. This allows the framework to be used in scenarios where low quality or no depth image is captured. Experiments on two real-world RGB-D datasets, namely Kasparov and IIITD RGB-D, show the efficacy of the proposed method. The second contribution proposes a Generative Adversarial Network (GAN) based approach to learn an image to image transformation model for enhancing the resolution of a face image. Unsupervised GAN based transformation methods in their native formulation might alter useful discriminative information in the transformed face images. This affects the performance of face recognition algorithms when applied on the transformed images. We propose a Supervised Resolution Enhancement and Recognition Network (SUPREAR-NET), which does not corrupt the useful class-specific information of the face image and transforms a low resolution probe image into a high resolution one, followed by effective matching with the gallery using a trained discriminative model. We show the results for cross-resolution face recognition on three datasets including the FaceSurv face dataset, containing poor quality low resolution videos captured at a standoff distance up to 10 meters from the camera.