Acoustic cues for multilingual abuse detection

Thakran, Yash; Abrol, Vinayak (Advisor)

Home
→
Electronics and Communication Engineering
→
BTech Projects
→
Year-2023
→
View Item

dc.contributor.author	Thakran, Yash
dc.contributor.author	Abrol, Vinayak (Advisor)
dc.date.accessioned	2024-05-20T10:52:55Z
dc.date.available	2024-05-20T10:52:55Z
dc.date.issued	2023-05-09
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1542
dc.description.abstract	Abusive content detection in the spoken text can be addressed by performing Automatic Speech Recognition (ASR) and leveraging advancements in natural language processing. However, ASR models introduce latency and often perform suboptimally for abusive words as they are underrepresented in training corpora and not spoken clearly or entirely. Abusive content on social media platforms is undesirable as it impedes healthy and safe social media interactions. While automatic abuse detection has been widely explored in the textual domain, audio abuse detection remains unexplored. The lack of audio datasets has limited mainly an exploration of this problem entirely in the audio domain. We have used ADIMA, a linguistically diverse, ethically sourced, expert annotated, and well-balanced multilingual abuse detection audio dataset comprising 11,775 audio samples in 10 Indic languages spanning 65 hours and spoken by 6,446 unique users. This work focuses on audio abuse detection from an acoustic cue perspective in a multilingual social media setting. While textual abuse detection has been widely researched; comparatively, abuse detection from audio remains unexplored. Our key hypothesis is based on the fact that abusive behavior leads to distinct acoustic cues. Such cues can help detect abuse directly from audio signals without the need to transcribe them. We first demonstrate that employing a generic large pre-trained acoustic/language model is suboptimal. This proves that incorporating the right acoustic cues might be the way forward to improve performance and achieve generalization. Our proposed method explicitly focuses on two modalities, namely, the underlying emotions expressed and the language features of audio. On the recently proposed ADIMA benchmark for this task, our approach achieves the stateof- the-art performance of 96% on the test set and outperforms existing best models by a large margin.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	abusive content detection	en_US
dc.subject	multilingual audio analysis	en_US
dc.subject	abusive speech detection	en_US
dc.subject	multimodal abuse detection	en_US
dc.subject	multilingual abuse detection	en_US
dc.subject	speech processing	en_US
dc.subject	transfer learning	en_US
dc.title	Acoustic cues for multilingual abuse detection	en_US
dc.type	Other	en_US