Venue discovery using supervised generative adversarial cross-modal hashing

Aggarwal, Himanshu; Shah, Rajiv Ratn (Advisor); NII, Yi Yu (Advisor)

Venue discovery using supervised generative adversarial cross-modal hashing

Aggarwal, Himanshu; Shah, Rajiv Ratn (Advisor); NII, Yi Yu (Advisor)

URI: http://repository.iiitd.edu.in/xmlui/handle/123456789/803

Date: 2019-07

Abstract:

With a massive amount of online multimedia data (e.g., images, videos, text articles, etc.) and increasing needs of the people, venue discovery using multimedia data has become an underlined research topic. We are referring to business and travel locations as venues in this study and aim to improve e ciency of venue discovery by hashing. Previously, a lot of work has been done in the eld of cross-modal retrieval for reducing the heterogeneous gap between multiple modalities, so that samples from those modalities can be compared directly. Such techniques have also shown their application in venue discovery. However, improved technology has increased the size of the multimedia data and thus has made the retrieval more di cult and slower. Therefore, hashing techniques are being developed to project features from di erent modalities into a common hamming space. Hash features take very less storage space, and they can be compared faster than the real-valued features, using hamming distance. In this thesis, we propose an adversarial learning-based approach of generating hash code for venue-related heterogeneous multimedia data to ease the task of venue discovery without any location information. Previous works have shown the great ability of Generative Adversarial Networks (GANs) to model the distribution of the data and learn discriminative representations. We show how GANs can be used to learn to generate hash codes with category and pairwise information that occur naturally in the data. Most existing supervised cross-modal hashing methods map data in di erent modalities to Hamming space, where the semantic information is exploited to supervise data in di erent modalities during the training stage. However, previous works neglect pairwise similarity between data in di erent modalities, which lead to degraded performance of the model for nding exact matches for the queries. To address this issue, we propose a supervised Generative Adversarial Cross-modal Hashing method by Transferring Pairwise Similarities (SGACH-TPS). This work has three signi cant contributions: i) we propose a model for making e cient venue discovery on a new dataset, WikiVenue, of real-world images produced by the people, ii) the supervised generative adversarial network to construct a hash function that can map multimodal data of image-text pairs to a common hamming space, and, iii) a simple transfer training strategy for the adversarial network is suggested to supervise di erent modalities of samples where we transfer the pairwise similarity to the ne-tuning stage of training. To generalize our work in the eld of cross-modal retrieval, we showed experiments with the benchmark datasets, Wiki, and NUS-WIDE.

Show full item record