Frequency domain gradient visualization for acoustic models

Thakran, Yash; Abrol, Vinayak (Advisor)

Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1533

Full metadata record

DC Field	Value	Language
dc.contributor.author	Thakran, Yash	-
dc.contributor.author	Abrol, Vinayak (Advisor)	-
dc.date.accessioned	2024-05-20T08:56:13Z	-
dc.date.available	2024-05-20T08:56:13Z	-
dc.date.issued	2023-12-08	-
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/1533	-
dc.description.abstract	Modeling directly raw waveforms through neural networks for speech processing is gaining more and more attention. Despite its varied success, a question that remains is: what kind of information are such neural networks capturing or learning for different tasks from the speech signal? Such an insight is not only interesting for advancing those techniques but also for understanding better speech signal characteristics. This paper takes a step in that direction, where we develop a gradient based approach to estimate the relevance of each speech sample input on the output score. We show that analysis of the resulting “relevance signal” through conventional speech signal processing techniques can reveal the information modeled by the whole network. We demonstrate the potential of the proposed approach by analyzing raw waveform CNN-based automatic speech recognition and speaker verification systems.	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT-Delhi	en_US
dc.subject	deep learning	en_US
dc.subject	CNN visualization	en_US
dc.subject	gradients	en_US
dc.subject	raw waveforms	en_US
dc.title	Frequency domain gradient visualization for acoustic models	en_US
dc.type	Other	en_US
Appears in Collections:	Year-2023

Files in This Item:

File	Description	Size	Format
BTP_Report_Yash Thakran_2020269 - Yash Thakran.pdf Restricted Access		971.27 kB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets