Phoneme-based language translation for speech synthesis using sparse matrix representations

Patil, Akshet; Abrol, Vinayak (Advisor)

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1830

Full metadata record

DC Field	Value	Language
dc.contributor.author	Patil, Akshet	-
dc.contributor.author	Abrol, Vinayak (Advisor)	-
dc.date.accessioned	2026-04-03T06:52:02Z	-
dc.date.available	2026-04-03T06:52:02Z	-
dc.date.issued	2025-05	-
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1830	-
dc.description.abstract	This thesis introduces a novel framework for language translation, transitioning from conventional text-based mapping to a phoneme-level modeling approach. By employing articulatory phoneme representations and sparse binary matrices, the proposed architecture effectively aligns source and target languages at the phoneme level, leveraging a transformer-based encoder-decoder framework. Data preparation involved aligning multilingual text corpora from sources such as Mozilla Common Voice and CVSS, followed by phoneme extraction using tools like eSpeak NG. A distinctive aspect of this work is the development of a phoneme dictionary, constructed by grouping phoneme rows into word-like segments, resulting in a 10 times more expressive vocabulary than conventional row-level mappings. The proposed pipeline demonstrated a 35% improvement in phoneme alignment accuracy, alongside a substantial enhancement in speech intelligibility, achieved through mel-spectrogram generation from articulatory matrices and synthesis via a GAN-based vocoder. This approach simplifies word boundary modeling and lays the groundwork for speech-to-phoneme translation and multilingual adaptation in low-resource settings. This work establishes a transformative direction in language translation, integrating phonological structure with advanced sequence modeling, offering significant implications for text-to-speech (TTS), cross-lingual speech generation, and direct speech-to-speech translation systems.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Phoneme	en_US
dc.subject	Audio Processing	en_US
dc.subject	Phoneme-level modeling	en_US
dc.subject	Trans- former encoder-decoder	en_US
dc.subject	Articulatory features	en_US
dc.subject	Multilingual text-to-speech (TTS)	en_US
dc.subject	peech-to-phoneme translation	en_US
dc.subject	GAN-based vocoder	en_US
dc.title	Phoneme-based language translation for speech synthesis using sparse matrix representations	en_US
dc.type	Thesis	en_US
Appears in Collections:	Year-2025

Files in This Item:

File	Description	Size	Format
MT23155_Akshet Patial.pdf		2.24 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets