Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/1533
Full metadata record
DC FieldValueLanguage
dc.contributor.authorThakran, Yash-
dc.contributor.authorAbrol, Vinayak (Advisor)-
dc.date.accessioned2024-05-20T08:56:13Z-
dc.date.available2024-05-20T08:56:13Z-
dc.date.issued2023-12-08-
dc.identifier.urihttp://repository.iiitd.edu.in/xmlui/handle/123456789/1533-
dc.description.abstractModeling directly raw waveforms through neural networks for speech processing is gaining more and more attention. Despite its varied success, a question that remains is: what kind of information are such neural networks capturing or learning for different tasks from the speech signal? Such an insight is not only interesting for advancing those techniques but also for understanding better speech signal characteristics. This paper takes a step in that direction, where we develop a gradient based approach to estimate the relevance of each speech sample input on the output score. We show that analysis of the resulting “relevance signal” through conventional speech signal processing techniques can reveal the information modeled by the whole network. We demonstrate the potential of the proposed approach by analyzing raw waveform CNN-based automatic speech recognition and speaker verification systems.en_US
dc.language.isoen_USen_US
dc.publisherIIIT-Delhien_US
dc.subjectdeep learningen_US
dc.subjectCNN visualizationen_US
dc.subjectgradientsen_US
dc.subjectraw waveformsen_US
dc.titleFrequency domain gradient visualization for acoustic modelsen_US
dc.typeOtheren_US
Appears in Collections:Year-2023

Files in This Item:
File Description SizeFormat 
BTP_Report_Yash Thakran_2020269 - Yash Thakran.pdf
  Restricted Access
971.27 kBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.