Abstract:
Automated facial analysis has widespread applicability in scenarios related to image tagging, access control, and surveillance. Initial research focused primarily on face recognition in constrained settings, where the captured face image had variations due to pose, illumination, or expression. With the increased applicability of facial analysis models in real world scenarios, dedicated research was required for data captured in unconstrained settings including resolution variations. When subjects are captured at a large stand-off distance from the acquisition device, the resulting region of interest (ROI) capturing the face is often small (less than 32 ⇥ 32), requiring recognition of low resolution or very low resolution facial regions. Data captured in such unconstrained scenarios also often contain people using different disguise accessories or occluded faces, resulting in the obfuscation of the face region, rendering automated face recognition challenging. To this effect, this dissertation focuses on facial analysis under low resolution and disguise variations. Two facial recognition algorithms have been presented for data captured in low and very resolution settings: Dual Directed Capsule Network and DeriveNet model, followed by two novel datasets (Disguised Faces in the Wild 2018 and 2019) for facilitating research on disguised faces in the wild along with a Disguise-Resilient face verification framework. This is followed by designing facial analysis models for attribute prediction in low and very low resolution settings. We begin with developing deep learning algorithms for low or very low resolution face recognition, which suffers from the challenge of limited interpretable information in the face images, thus resulting in ineffective feature extraction and classification. In order to address this challenge, we propose two novel algorithms: Dual Directed Capsule Network (DirectCapsNet) and DeriveNet model. Since low resolution face images contain limited meaningful information, we propose utilizing a small set of high resolution samples for directing the classification model towards learning richer features. The DirectCapsNet is built using a combination of convolutional and capsule layers, and is trained via three loss functions: HR-Anchor loss, Reconstruction loss, and Margin loss. DeriveNet thus learns rich feature representations for very low resolution samples by utilizing the auxiliary high resolution samples during training. While capsule layers encode rich features, they are computationally expensive and contain a larger number of trainable parameters. In order to address the above limitation, a novel DeriveNet model has been proposed for low and very low resolution face recognition. The proposed model utilizes a set of high resolution images for learning an effective recognition model via combination of two loss functions: Derived-Margin softmax loss and Reconstruction-Center loss. The proposed Derived-Margin softmax loss estimates the inter-class variations between low resolution samples and models it as a margin for learning improved classification boundaries. Experimental analysis is performed on different challenging real-world datasets including the Unconstrained College Students (UCCS) dataset for facial regions having less than 24 ⇥ 24 resolution. Comparison with recent techniques demonstrates state-of-the-art results by the proposed algorithm. The next contribution of this dissertation lies in the area of disguised face recognition, where individuals attempt to obfuscate the face region, either intentionally in order to fool the automated system, or unintentionally by the use of day-to-day facial accessories. To the best of our knowledge, most of the research focused on disguised face recognition in constrained scenarios, with limited disguise accessories and other variations. Therefore, as part of this dissertation, we propose the Disguised Faces in the Wild (DFW) 2018 and DFW2019 datasets containing face images with unconstrained disguise variations, captured across different resolutions, acquisition devices, lighting, pose, and expression. The datasets were released as part of two international workshops for facilitating research in this direction. We also present the Disguise-Resilient framework using a novel Disguise Encoder-Decoder network, with application to face verification. The efficacy of the proposed framework has been demonstrated on the challenging DFW2018 and DFW2019 datasets, where it achieves state-of-the-art performance. Further, the arduous task of disguised face recognition in low resolution settings has also been explored and presented to the research community. Baseline results and performance of the proposed framework for face images with resolutions varying from 32⇥32 to 16⇥16 demand dedicated research focus from the community. The final contribution of this dissertation focuses on developing deep learning algorithms for learning discriminative features, with application to attribute classification in low resolution face images. Automated prediction of attributes such as gender (male/female) or adulthood (adult/child) can be useful as ancillary information for person identification, enhanced human computer interaction, or for restricting age-based access. As part of this contribution, two supervised variations of the deep learning based Autoencoder model are proposed for learning class-specific features: Class Specific Mean Autoencoder and Class Representative Autoencoder. Both models utilize the concept that the mean feature of a given class contains class-specific information which can be incorporated for learning discriminative rich features. To the best of our knowledge, this is one of the initial research focused on analyzing attributes in low resolution facial regions. The proposed autoencoder models are able to extract meaningful information by modeling the inter-class and intra-class variations, resulting in improved performance for low resolution attribute classification from face images. Experimental evaluation on different datasets for facial images of 24 ⇥ 24 and 16 ⇥ 16 resolution demonstrate the effectiveness of the techniques.