IIIT-Delhi Institutional Repository

Multi-modal fusion transformer for understanding digital advertisements

Show simple item record

dc.contributor.author Khurana, Varun
dc.contributor.author Shah, Rajiv Ratn (Advisor)
dc.date.accessioned 2025-06-19T12:25:21Z
dc.date.available 2025-06-19T12:25:21Z
dc.date.issued 2023-05-10
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1750
dc.description.abstract In today’s world, digital-born media, especially advertisements, have a substantial influence on our daily lives, from persuading us to buy particular brands to creating awareness about a social or environmental cause. This work proposes LearnAd, a learning method for the challenging task of understanding advertisements. Marketing graphics such as advertisements are digitally borne, multi modal (contain both text and visual content) and employ rhetorical devices such as emotions, symbolism, and slogans to convey meaning. On the other hand, most of the work in visual content understanding today is about camera shot images which does not translate well to marketing graphics To address this gap, we propose using human content interaction patterns in the form of eye movements to finetune the understanding of Vision Transformer (ViT). This helps LearnAd – a multimodal transformer-based cross-attention model, achieve state of the art results on three advertisement understanding tasks – generation of the action that an ad persuades a user to take and the reason it provides for the action (what-why of the ad), and prediction of the sentiment and topic of the advertisement image. Despite the lack of availability of real customer gaze patterns over marketing images, LearnAd achieves state of the art performance on three advertisement understanding tasks with the help of generated human saliency patterns. en_US
dc.language.iso en_US en_US
dc.publisher III-Delhi en_US
dc.subject Digital advertisements en_US
dc.subject Digital marketing en_US
dc.subject Multi-modal content understanding en_US
dc.subject Advertisement understanding en_US
dc.subject Transformer en_US
dc.subject Cross attention en_US
dc.title Multi-modal fusion transformer for understanding digital advertisements en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account