dc.contributor.author | Goel, Arnav | |
dc.contributor.author | Hira, Medha | |
dc.contributor.author | Gupta, Anubha (Advisor) | |
dc.date.accessioned | 2024-05-13T13:46:51Z | |
dc.date.available | 2024-05-13T13:46:51Z | |
dc.date.issued | 2023-11-29 | |
dc.identifier.uri | http://repository.iiitd.edu.in/xmlui/handle/123456789/1456 | |
dc.description.abstract | We benchmark various Speech-to-Text (STT) and Text-to-Speech (TTS) models and performed an extensive literature review on downstream tasks such as Automatic Speech Recognition (ASR), Speech Emotion Recognition, Speaker Identification and Prosody Transfer. This led us to understanding the paradigms existing in the domain of audio processing and enabled us to work on speech processing and synthesis tasks. We prepared a novel multilingual speech-tospeech system with translation using State-of-the-Art ASR, TTS and Voice Conversion models. This allowed us to experiment with speaker embedding conditioning in TTS systems and explore posterior and prior conditions. We present the results in this report. | en_US |
dc.language.iso | en_US | en_US |
dc.publisher | IIIT-Delhi | en_US |
dc.subject | Automatic Speech Recognition | en_US |
dc.subject | Multilingual | en_US |
dc.subject | Speech | en_US |
dc.subject | Speaker Embeddings | en_US |
dc.subject | Textto- Speech | en_US |
dc.title | Improving speech to speech conversion with incorporation of speaker tonality | en_US |
dc.type | Other | en_US |