Person Re-identification

Sundararajan, Niranjan; Dubey, Vibhu; Subramanyam, A V (Advisor)

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1492

Title:	Person Re-identification
Authors:	Sundararajan, Niranjan Dubey, Vibhu Subramanyam, A V (Advisor)
Keywords:	Text-based Person Re-Identification Image-Text Retrieval Meta-learning Information Retrieval Machine Learning
Issue Date:	11-Dec-2023
Publisher:	IIIT-Delhi
Abstract:	Image-Text Retrieval (ITR) is the task of retrieving an image from a corresponding textual description and/or a textual description from the corresponding image. Person Re-Identification (Person Re-ID) is a downstream task of ITR where the images and texts are descriptions of persons. Our paper focuses only on Text-based Person Re-ID, retrieving images from their textual descriptions. The key challenge in Person Re-ID is that the textual modality is feature coarse, whereas the image modality is feature dense. The granularity gap between both these modalities is large. Adding to this, the inherent modalities of images and texts are also different, leading to a large modality gap. Therefore, feature learning becomes difficult. Another problem currently faced in this domain is the shortage of datasets, primarily due to privacy concerns that pedestrians face getting their images clicked. A possible solution is to learn with a combination of datasets. Incorporating meta-learning to learn across datasets while retaining model robustness is a possible solution to this problem. In this paper, we aim to develop a model that can learn effectively despite the image-text granularity gap while incorporating multiple datasets for its training using meta-learning. CUHK-PEDES, RSTPReid and ICFG-PEDES are the three available benchmarks to evaluate T2I ReID methods. RSTPReid and ICFG-PEDES comprise of identities from MSMT17 but due to limited number of unique persons, the diversity is limited. On the other hand, CUHK-PEDES comprises of 13,003 identities but has relatively shorter text description on average. Further, these datasets are captured in a restricted environment with limited number of cameras. In order to further diversify the identities and provide dense captions, we propose a novel dataset called IIITD-20K. IIITD20K comprises of 20,000 unique identities captured in the wild and provides a rich dataset for text-to-image ReID. We further synthetically generate images and fine-grained captions using Stable-diffusion and BLIP models trained on our dataset. We perform elaborate experiments using state-of-art text-to-image ReID models and vision-language pretrained models and present a comprehensive analysis of the dataset. Our experiments also reveal that synthetically generated data leads to a substantial performance improvement in both same dataset as well as cross dataset settings
URI:	http://repository.iiitd.edu.in/xmlui/handle/123456789/1492
Appears in Collections:	Year-2023

Files in This Item:

File	Description	Size	Format
Person_Re_ID - Vibhu Dubey.pdf Restricted Access		415.12 kB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets