IIIT-Delhi Institutional Repository

Phoneme-based language translation for speech synthesis using sparse matrix representations

Show simple item record

dc.contributor.author Patil, Akshet
dc.contributor.author Abrol, Vinayak (Advisor)
dc.date.accessioned 2026-04-03T06:52:02Z
dc.date.available 2026-04-03T06:52:02Z
dc.date.issued 2025-05
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1830
dc.description.abstract This thesis introduces a novel framework for language translation, transitioning from conventional text-based mapping to a phoneme-level modeling approach. By employing articulatory phoneme representations and sparse binary matrices, the proposed architecture effectively aligns source and target languages at the phoneme level, leveraging a transformer-based encoder-decoder framework. Data preparation involved aligning multilingual text corpora from sources such as Mozilla Common Voice and CVSS, followed by phoneme extraction using tools like eSpeak NG. A distinctive aspect of this work is the development of a phoneme dictionary, constructed by grouping phoneme rows into word-like segments, resulting in a 10 times more expressive vocabulary than conventional row-level mappings. The proposed pipeline demonstrated a 35% improvement in phoneme alignment accuracy, alongside a substantial enhancement in speech intelligibility, achieved through mel-spectrogram generation from articulatory matrices and synthesis via a GAN-based vocoder. This approach simplifies word boundary modeling and lays the groundwork for speech-to-phoneme translation and multilingual adaptation in low-resource settings. This work establishes a transformative direction in language translation, integrating phonological structure with advanced sequence modeling, offering significant implications for text-to-speech (TTS), cross-lingual speech generation, and direct speech-to-speech translation systems. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Phoneme en_US
dc.subject Audio Processing en_US
dc.subject Phoneme-level modeling en_US
dc.subject Trans- former encoder-decoder en_US
dc.subject Articulatory features en_US
dc.subject Multilingual text-to-speech (TTS) en_US
dc.subject peech-to-phoneme translation en_US
dc.subject GAN-based vocoder en_US
dc.title Phoneme-based language translation for speech synthesis using sparse matrix representations en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account