IIIT-Delhi Institutional Repository

Mitigating gender bias in speech-to-speech translation

Show simple item record

dc.contributor.author Aggarwal, Aditya
dc.contributor.author Gupta, Anubha (Advisor)
dc.date.accessioned 2026-04-21T10:12:38Z
dc.date.available 2026-04-21T10:12:38Z
dc.date.issued 2024-12-13
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1951
dc.description.abstract Despite advances in Machine Translation (MT), translating gender-marked languages remains chal- lenging. The challenge is further compounded by existing models that attempt to handle both co-reference resolution and translation simultaneously. Additionally, standard parallel MT datasets often default to masculine forms in Hindi, introducing a bias that limits MT models’ ability to accu- rately learn gendered translations. This paper investigates this problem using English (En) to Hindi (Hi) translation, a language pair with distinct gender-marked grammatical structures and addresses these challenges in the following ways: (1) This paper introduces the Speaker-Aware Gender Eval- uation Corpus (SAGECorp), a synthetic dataset comprising 13,420 En-Hi sentence pairs, including contrastive gendered sentences for each pair. (2) To address the inefficiencies of existing models, this work proposes a lightweight, plug-and-play framework that leverages a small language model (SLM) as a post-processing solution to improve gender-aware translation. (3) To robustly measure the effectiveness of co-reference resolution in gender-aware models, a new metric, Weighted Gender Accuracy (WGA), is proposed. In the end, elaborate benchmarking of three small multilingual language models (LLAMA-3.2, Phi-3.5-mini, Gemma 2.0) has been carried out on the SAGECorp dataset on multiple metrics including our newly introduced WGA metric. The proposed framework demonstrates a 16% average improvement over the baseline translator in gender-aware translation when evaluated on the same dataset. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Speech Translation en_US
dc.subject Gender-aware Machine Translation en_US
dc.subject Gender Bias en_US
dc.subject Small Language Model en_US
dc.subject Neural Machine Translator en_US
dc.title Mitigating gender bias in speech-to-speech translation en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account