Capturing mannerisms in speech data

Nambiar, Pranav; Arora, Chetan (Advisor); Anand, Saket (Advisor)

dc.contributor.author	Nambiar, Pranav
dc.contributor.author	Arora, Chetan (Advisor)
dc.contributor.author	Anand, Saket (Advisor)
dc.date.accessioned	2019-10-07T07:05:22Z
dc.date.available	2019-10-07T07:05:22Z
dc.date.issued	2018-04-30
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/749
dc.description.abstract	Text-to-speech systems generally require large amounts of annotated speech data with the quality of both the annotations and speech being a huge factor. As a result, most of the research performed has been on highly curated data collected in labs. This problem is not a huge factor while dealing with the generation of popular speech like American English or common English, but the problem arises while generating not so common accents. It is not viable to create a focus group to create a dataset every time we need to generate some new type of accented speech. This report tries to explore a new approach to generate not so common accented speeches using a popular, labeled dataset to learn the language and another unlabelled dataset of the accent we wish to learn. The approach uses GANs, a new concept introduced by Ian Goodfellow in 2014. The approach broadly aims to balance the comprehensibility-accent replication gap by using an ASR (Automatic Speech Recognition) in conjunction with a discriminator trained to recognize the accent we wish to recreate. The balance can be maintained by using a weighted sum of the two error functions to train the generative model. We intend to use the proposed model to recreate Chinese accented English for a sanity check and to show the correctness of our model followed by which we will tackle more obscure tasks like animal accented speech.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIITD-Delhi	en_US
dc.subject	Speech	en_US
dc.subject	Generative Model	en_US
dc.subject	Tacotron	en_US
dc.subject	Generator	en_US
dc.subject	Discriminator	en_US
dc.subject	Speech synthesis	en_US
dc.subject	Deep learning	en_US
dc.subject	Machine learning	en_US
dc.subject	ASR	en_US
dc.subject	Speech recognition	en_US
dc.title	Capturing mannerisms in speech data	en_US
dc.type	Other	en_US