IIIT-Delhi Institutional Repository

In-domain conquers commonsense knowledge: leveraging domain-specific toxicity attributes for explanation generation of implied hate speech

Show simple item record

dc.contributor.author Yadav, Neemesh
dc.contributor.author Akhtar, Md. Shad (Advisor)
dc.date.accessioned 2024-05-21T09:35:11Z
dc.date.available 2024-05-21T09:35:11Z
dc.date.issued 2023-11-29
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1554
dc.description.abstract Recognizing subtle and implied forms of hate speech is convoluted. Fine-tuning pre-trained language models (PLMs) to generate an explanation for an incoming implicit statement has become an active area of research. Moreover, the application of fine-tuning PLMs with commonsense knowledge is also rising. Interestingly, this study finds contradictory evidence for the role of the quality of knowledge graph (KG) tuples in generating implicit explanations. Across two datasets and KGs, we observe that replacing top-k KG tuples with the respective bottom-k or random-k set does not always lead to performance deterioration as expected. Our investigation further reveals this behavior to arise from the de-facto manner (task-independent) of extracting/ retrieving the KG tuples. Intrigued by this, we explore other forms of external signals (task-dependent) that can be of benefit to implicit hate explanation systems. Our findings indicate that employing a simpler model incorporating these attributes can achieve comparable or better results than KG-based systems. We evaluate our proposed system on SBIC and LatentHatred datasets. Compared to the KG-infused baseline, we observe a gain of +5.93 (+0.49), +6.05 (-1.56), and +3.52 (+0.77 ) in BLEU, Rouge-L, and BERTScore on SBIC (LatentHatred). Following this, we conduct a human evaluation and observe that the proposed method produces semantically richer and more precise explanations than zero-shot GPT-3.5. We conclude with a discussion of errors originating at both modeling and dataset levels to highlight the intricate nature of the task1 en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Hate Speech en_US
dc.subject Explaining Hate Speech en_US
dc.subject Computational Social Science en_US
dc.subject In-context Learning en_US
dc.subject Explainability en_US
dc.subject NLP For Social Good en_US
dc.title In-domain conquers commonsense knowledge: leveraging domain-specific toxicity attributes for explanation generation of implied hate speech en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account