IIIT-Delhi Institutional Repository

Improving speech to speech conversion with incorporation of speaker tonality

Show simple item record

dc.contributor.author Goel, Arnav
dc.contributor.author Hira, Medha
dc.contributor.author Gupta, Anubha (Advisor)
dc.date.accessioned 2024-05-13T13:46:51Z
dc.date.available 2024-05-13T13:46:51Z
dc.date.issued 2023-11-29
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1456
dc.description.abstract We benchmark various Speech-to-Text (STT) and Text-to-Speech (TTS) models and performed an extensive literature review on downstream tasks such as Automatic Speech Recognition (ASR), Speech Emotion Recognition, Speaker Identification and Prosody Transfer. This led us to understanding the paradigms existing in the domain of audio processing and enabled us to work on speech processing and synthesis tasks. We prepared a novel multilingual speech-tospeech system with translation using State-of-the-Art ASR, TTS and Voice Conversion models. This allowed us to experiment with speaker embedding conditioning in TTS systems and explore posterior and prior conditions. We present the results in this report. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Automatic Speech Recognition en_US
dc.subject Multilingual en_US
dc.subject Speech en_US
dc.subject Speaker Embeddings en_US
dc.subject Textto- Speech en_US
dc.title Improving speech to speech conversion with incorporation of speaker tonality en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account