IIIT-Delhi Institutional Repository

VT-Nood : vision transformer based near ood detection in fine-grained datasets

Show simple item record

dc.contributor.author Dey, Kaushik
dc.contributor.author Prasad, Ranjitha (Advisor)
dc.date.accessioned 2024-09-21T09:54:17Z
dc.date.available 2024-09-21T09:54:17Z
dc.date.issued 2024-08-01
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1675
dc.description.abstract Machine Learning models output high confidence scores when provided with input samples that belong to the input data distribution, referred to as in-distribution (ID) samples. However, when presented with an Out of Distribution (OoD) sample during inference, the same model often outputs uncertain confidence scores raising the question of the interpretability and reliability of these models. Typically, ML models are incapable of detecting near-OOD samples, which are perceptually similar, but semantically dissimilar samples closely resembling the input distribution with fine-grained variations. We address the issue of fine-grained variations using vision transformers (ViT) and capture the patch-based correlations through the self-attention mechanism. The Vision Transformer is a part of our VAE architecture along with a neural network which helps in patch-level disentanglement. Such a patch-level disentanglement using a ViT encoder results in disentangling the common latent factors for the entire image. To the best of our knowledge, this is the first work that uses a ViT with encoder-decoder architecture for OOD detection. Using experiments on fine-grained datasets such as Oxford Flowers102 and CUB200 Birds Dataset, we demonstrate that the proposed method outperforms OOD-aware baselines in terms of several OOD metrics. Our framework outperforms the density based techniques and classification based methods as compared on OOD Detection metrics such as AUC, AUPR and FPR@95. We also demonstrate that training the entire network with SRGAN decoder with a combination of Mean Square Loss and Perceptual Loss can learn better representations for density-based methods. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject ViT and OOD en_US
dc.subject ESRGAN Decoder en_US
dc.subject VT-NOOD en_US
dc.title VT-Nood : vision transformer based near ood detection in fine-grained datasets en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account