IIIT-Delhi Institutional Repository

Acoustic cues for multilingual abuse detection

Show simple item record

dc.contributor.author Thakran, Yash
dc.contributor.author Abrol, Vinayak (Advisor)
dc.date.accessioned 2024-05-20T10:52:55Z
dc.date.available 2024-05-20T10:52:55Z
dc.date.issued 2023-05-09
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1542
dc.description.abstract Abusive content detection in the spoken text can be addressed by performing Automatic Speech Recognition (ASR) and leveraging advancements in natural language processing. However, ASR models introduce latency and often perform suboptimally for abusive words as they are underrepresented in training corpora and not spoken clearly or entirely. Abusive content on social media platforms is undesirable as it impedes healthy and safe social media interactions. While automatic abuse detection has been widely explored in the textual domain, audio abuse detection remains unexplored. The lack of audio datasets has limited mainly an exploration of this problem entirely in the audio domain. We have used ADIMA, a linguistically diverse, ethically sourced, expert annotated, and well-balanced multilingual abuse detection audio dataset comprising 11,775 audio samples in 10 Indic languages spanning 65 hours and spoken by 6,446 unique users. This work focuses on audio abuse detection from an acoustic cue perspective in a multilingual social media setting. While textual abuse detection has been widely researched; comparatively, abuse detection from audio remains unexplored. Our key hypothesis is based on the fact that abusive behavior leads to distinct acoustic cues. Such cues can help detect abuse directly from audio signals without the need to transcribe them. We first demonstrate that employing a generic large pre-trained acoustic/language model is suboptimal. This proves that incorporating the right acoustic cues might be the way forward to improve performance and achieve generalization. Our proposed method explicitly focuses on two modalities, namely, the underlying emotions expressed and the language features of audio. On the recently proposed ADIMA benchmark for this task, our approach achieves the stateof- the-art performance of 96% on the test set and outperforms existing best models by a large margin. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject abusive content detection en_US
dc.subject multilingual audio analysis en_US
dc.subject abusive speech detection en_US
dc.subject multimodal abuse detection en_US
dc.subject multilingual abuse detection en_US
dc.subject speech processing en_US
dc.subject transfer learning en_US
dc.title Acoustic cues for multilingual abuse detection en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account