Please use this identifier to cite or link to this item:
http://repository.iiitd.edu.in/xmlui/handle/123456789/1830Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Patil, Akshet | - |
| dc.contributor.author | Abrol, Vinayak (Advisor) | - |
| dc.date.accessioned | 2026-04-03T06:52:02Z | - |
| dc.date.available | 2026-04-03T06:52:02Z | - |
| dc.date.issued | 2025-05 | - |
| dc.identifier.uri | http://repository.iiitd.edu.in/xmlui/handle/123456789/1830 | - |
| dc.description.abstract | This thesis introduces a novel framework for language translation, transitioning from conventional text-based mapping to a phoneme-level modeling approach. By employing articulatory phoneme representations and sparse binary matrices, the proposed architecture effectively aligns source and target languages at the phoneme level, leveraging a transformer-based encoder-decoder framework. Data preparation involved aligning multilingual text corpora from sources such as Mozilla Common Voice and CVSS, followed by phoneme extraction using tools like eSpeak NG. A distinctive aspect of this work is the development of a phoneme dictionary, constructed by grouping phoneme rows into word-like segments, resulting in a 10 times more expressive vocabulary than conventional row-level mappings. The proposed pipeline demonstrated a 35% improvement in phoneme alignment accuracy, alongside a substantial enhancement in speech intelligibility, achieved through mel-spectrogram generation from articulatory matrices and synthesis via a GAN-based vocoder. This approach simplifies word boundary modeling and lays the groundwork for speech-to-phoneme translation and multilingual adaptation in low-resource settings. This work establishes a transformative direction in language translation, integrating phonological structure with advanced sequence modeling, offering significant implications for text-to-speech (TTS), cross-lingual speech generation, and direct speech-to-speech translation systems. | en_US |
| dc.language.iso | en_US | en_US |
| dc.publisher | IIIT-Delhi | en_US |
| dc.subject | Phoneme | en_US |
| dc.subject | Audio Processing | en_US |
| dc.subject | Phoneme-level modeling | en_US |
| dc.subject | Trans- former encoder-decoder | en_US |
| dc.subject | Articulatory features | en_US |
| dc.subject | Multilingual text-to-speech (TTS) | en_US |
| dc.subject | peech-to-phoneme translation | en_US |
| dc.subject | GAN-based vocoder | en_US |
| dc.title | Phoneme-based language translation for speech synthesis using sparse matrix representations | en_US |
| dc.type | Thesis | en_US |
| Appears in Collections: | Year-2025 | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| MT23155_Akshet Patial.pdf | 2.24 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.