Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1492
Title: Person Re-identification
Authors: Sundararajan, Niranjan
Dubey, Vibhu
Subramanyam, A V (Advisor)
Keywords: Text-based Person Re-Identification
Image-Text Retrieval
Meta-learning
Information Retrieval
Machine Learning
Issue Date: 11-Dec-2023
Publisher: IIIT-Delhi
Abstract: Image-Text Retrieval (ITR) is the task of retrieving an image from a corresponding textual description and/or a textual description from the corresponding image. Person Re-Identification (Person Re-ID) is a downstream task of ITR where the images and texts are descriptions of persons. Our paper focuses only on Text-based Person Re-ID, retrieving images from their textual descriptions. The key challenge in Person Re-ID is that the textual modality is feature coarse, whereas the image modality is feature dense. The granularity gap between both these modalities is large. Adding to this, the inherent modalities of images and texts are also different, leading to a large modality gap. Therefore, feature learning becomes difficult. Another problem currently faced in this domain is the shortage of datasets, primarily due to privacy concerns that pedestrians face getting their images clicked. A possible solution is to learn with a combination of datasets. Incorporating meta-learning to learn across datasets while retaining model robustness is a possible solution to this problem. In this paper, we aim to develop a model that can learn effectively despite the image-text granularity gap while incorporating multiple datasets for its training using meta-learning. CUHK-PEDES, RSTPReid and ICFG-PEDES are the three available benchmarks to evaluate T2I ReID methods. RSTPReid and ICFG-PEDES comprise of identities from MSMT17 but due to limited number of unique persons, the diversity is limited. On the other hand, CUHK-PEDES comprises of 13,003 identities but has relatively shorter text description on average. Further, these datasets are captured in a restricted environment with limited number of cameras. In order to further diversify the identities and provide dense captions, we propose a novel dataset called IIITD-20K. IIITD20K comprises of 20,000 unique identities captured in the wild and provides a rich dataset for text-to-image ReID. We further synthetically generate images and fine-grained captions using Stable-diffusion and BLIP models trained on our dataset. We perform elaborate experiments using state-of-art text-to-image ReID models and vision-language pretrained models and present a comprehensive analysis of the dataset. Our experiments also reveal that synthetically generated data leads to a substantial performance improvement in both same dataset as well as cross dataset settings
URI: http://repository.iiitd.edu.in/xmlui/handle/123456789/1492
Appears in Collections:Year-2023

Files in This Item:
File Description SizeFormat 
Person_Re_ID - Vibhu Dubey.pdf
  Restricted Access
415.12 kBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.