Improving speech to speech conversion with incorporation of speaker tonality

Goel, Arnav; Hira, Medha; Gupta, Anubha (Advisor)

dc.contributor.author	Goel, Arnav
dc.contributor.author	Hira, Medha
dc.contributor.author	Gupta, Anubha (Advisor)
dc.date.accessioned	2024-05-13T13:46:51Z
dc.date.available	2024-05-13T13:46:51Z
dc.date.issued	2023-11-29
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1456
dc.description.abstract	We benchmark various Speech-to-Text (STT) and Text-to-Speech (TTS) models and performed an extensive literature review on downstream tasks such as Automatic Speech Recognition (ASR), Speech Emotion Recognition, Speaker Identification and Prosody Transfer. This led us to understanding the paradigms existing in the domain of audio processing and enabled us to work on speech processing and synthesis tasks. We prepared a novel multilingual speech-tospeech system with translation using State-of-the-Art ASR, TTS and Voice Conversion models. This allowed us to experiment with speaker embedding conditioning in TTS systems and explore posterior and prior conditions. We present the results in this report.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	Automatic Speech Recognition	en_US
dc.subject	Multilingual	en_US
dc.subject	Speech	en_US
dc.subject	Speaker Embeddings	en_US
dc.subject	Textto- Speech	en_US
dc.title	Improving speech to speech conversion with incorporation of speaker tonality	en_US
dc.type	Other	en_US