IIIT-Delhi Institutional Repository

Towards green and inclusive speech processing: understanding and responsibly mitigating linguistic and accent biases

Show simple item record

dc.contributor.author Sharma, V. Divya
dc.contributor.author Gupta, Anubha (Advisor)
dc.date.accessioned 2026-05-12T11:24:08Z
dc.date.available 2026-05-12T11:24:08Z
dc.date.issued 2026-04
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1966
dc.description.abstract High-quality synthetic speech has transformative potential for accessibility, education, entertainment, and personalized human–computer interaction. However, it also poses serious risks: synthetic voices can be exploited for audio deepfakes and impersonation attacks. These risks are magnified in multilingual and low-resource settings, where audio deepfake detection (ADD) and speaker verification (SV) systems exhibit pro-nounced linguistic biases, and the scarcity of large-scale, publicly available datasets limits the development of robust, fair, and inclusive models. Moreover, existing methods for evaluating synthetic speech quality rely primarily on human studies, which are costly, difficult to scale, and often lack reproducibility. Additionally, synthetic speech generation models incur significant carbon emissions, yet environmental sustainability remains largely overlooked. Together, these challenges highlight a critical need for datasets, evaluation frameworks, and bias-mitigation methods that can enable responsible, inclusive, and environmentally conscious speech technologies. To address these gaps, this thesis makes the following key contributions: First, we introduce IndicSynth, a large-scale synthetic speech dataset covering 12 low-resource Indian languages to support multilingual ADD and anti-spoofing research. IndicSynth balances realistic voice mimicry and synthetic diversity. Using IndicSynth, we demon-strate the vulnerability of existing ADD and SV models against synthetic speech attacks. Human evaluation further validates the dataset quality, underscoring the dataset’s utility for security-focused applications. Second, we present Task-Lens, a cross-task profiling framework to mitigate task-resource gaps for underrepresented languages. Using Task-Lens, we profile 34 Indian speech datasets, including IndicSynth, covering 26 languages and eight downstream tasks, based on available metadata. Third, we propose FAtNet and EcoSpeak, which are cost-efficient methods for mitigating linguistic biases in speaker verification, addressing fully and partially cross-lingual scenarios while incorporating Green AI principles by reporting carbon emissions. Finally, we introduce GreenVoice, an automated environment-aware evaluation framework for synthetic speech generation models. GreenVoice cost-effectively highlights high-performing and sustainable generation models for large-scale synthetic speech dataset creation, thus enabling multilingual ADD and anti-spoofing research across more underrepresented languages and accents, beyond IndicSynth. Together, these contributions provide the foundations for building and evaluating speech technologies that are robust, equitable, and inclusive across languages and accents, while promoting environmentally responsible practices and supporting their reliable use in real-world applications. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Inclusive AI en_US
dc.subject Green AI en_US
dc.subject Bias Mitigation en_US
dc.subject Responsible AI en_US
dc.subject Synthetic Speech en_US
dc.title Towards green and inclusive speech processing: understanding and responsibly mitigating linguistic and accent biases en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account