IIIT-Delhi Institutional Repository

Capturing mannerisms in speech data

Show simple item record

dc.contributor.author Nambiar, Pranav
dc.contributor.author Arora, Chetan (Advisor)
dc.contributor.author Anand, Saket (Advisor)
dc.date.accessioned 2019-10-07T07:05:22Z
dc.date.available 2019-10-07T07:05:22Z
dc.date.issued 2018-04-30
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/749
dc.description.abstract Text-to-speech systems generally require large amounts of annotated speech data with the quality of both the annotations and speech being a huge factor. As a result, most of the research performed has been on highly curated data collected in labs. This problem is not a huge factor while dealing with the generation of popular speech like American English or common English, but the problem arises while generating not so common accents. It is not viable to create a focus group to create a dataset every time we need to generate some new type of accented speech. This report tries to explore a new approach to generate not so common accented speeches using a popular, labeled dataset to learn the language and another unlabelled dataset of the accent we wish to learn. The approach uses GANs, a new concept introduced by Ian Goodfellow in 2014. The approach broadly aims to balance the comprehensibility-accent replication gap by using an ASR (Automatic Speech Recognition) in conjunction with a discriminator trained to recognize the accent we wish to recreate. The balance can be maintained by using a weighted sum of the two error functions to train the generative model. We intend to use the proposed model to recreate Chinese accented English for a sanity check and to show the correctness of our model followed by which we will tackle more obscure tasks like animal accented speech. en_US
dc.language.iso en_US en_US
dc.publisher IIITD-Delhi en_US
dc.subject Speech en_US
dc.subject Generative Model en_US
dc.subject Tacotron en_US
dc.subject Generator en_US
dc.subject Discriminator en_US
dc.subject Speech synthesis en_US
dc.subject Deep learning en_US
dc.subject Machine learning en_US
dc.subject ASR en_US
dc.subject Speech recognition en_US
dc.title Capturing mannerisms in speech data en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account