Multimodal sarcasm explanation

Desai, Poorav; Akhtar, Md. Shad (Advisor); Chakraborty, Tanmoy (Advisor)

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1025

Full metadata record

DC Field	Value	Language
dc.contributor.author	Desai, Poorav	-
dc.contributor.author	Akhtar, Md. Shad (Advisor)	-
dc.contributor.author	Chakraborty, Tanmoy (Advisor)	-
dc.date.accessioned	2022-04-05T07:23:13Z	-
dc.date.available	2022-04-05T07:23:13Z	-
dc.date.issued	2021-08	-
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1025	-
dc.description.abstract	Sarcasm is a pervading linguistic phenomenon and highly challenging to explain due to its subjectivity, lack of context and deeply-felt opinion. In the multimodal setup, sarcasm is conveyed through the incongruity between the text and visual entities. Although recent approaches consider it as a classification problem, it is unclear why an online post is identified as sarcastic. Without proper explanation, end users may not be able to perceive the underlying use of irony. In this paper, we propose a novel problem – Multimodal Sarcasm Explanation (MSE) – given a multimodal sarcastic post containing an image and a caption, we aim to generate a natural language explanation to reveal the intended sarcasm. To this end, we develop a novel dataset, MORE, with explanation for 3510 sarcastic multimodal posts. Each explanation is a natural language (English) sentence that describes the hidden irony. We then propose EXMORE, a multimodal transformer-based architecture to address MSE. It incorporates a cross-modal attention in transformer’s encoder which attends the distinguishing features between two modalities. Subsequently, a BART-based auto-regressive decoder is used as the generator. Empirical results demonstrate the efficacy of EXMORE over six baselines (adopted for MSE) and shows > 10% improvement compared to the best baseline across five evaluation metrics	en_US
dc.language.iso	en	en_US
dc.publisher	IIIT- Delhi	en_US
dc.subject	Sarcasm Detection	en_US
dc.subject	OCR Extraction	en_US
dc.subject	Image Encoder	en_US
dc.subject	Text Encoder	en_US
dc.subject	Cross-modal Encoder	en_US
dc.title	Multimodal sarcasm explanation	en_US
dc.type	Thesis	en_US
Appears in Collections:	Year-2021

Files in This Item:

File	Description	Size	Format
Poorav_MTP_2021.pdf		1.24 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets