IIIT-Delhi Institutional Repository

Detection of doctored speech : towards an end-to-end parametric learn-able filter approach

Show simple item record

dc.contributor.author Arora, Rohit
dc.contributor.author Anand, Saket (Advisor)
dc.contributor.author Mohan, Aanchan (Advisor)
dc.date.accessioned 2023-04-03T13:04:14Z
dc.date.available 2023-04-03T13:04:14Z
dc.date.issued 2022-05
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/1077
dc.description.abstract The Automatic Speaker Verification (ASV) systems have potential in biomet- rics applications for logical control access and authentication. A lot of things happen to be at stake if the ASV system is compromised. The questions that this disserta- tion explores are what are the weaknesses in the spoofing system? (or what parts of the natural speech, the spoofing attacks cannot replicate?). What are the spectral features that are best suited for distinguishing spoofed and natural speech? How ef- ficient are the traditional handcrafted features as compared to the End-to-End(E2E) deep learning-based architectures in detecting such attacks? What are the relevant frequency regions in the spoofed samples that help distinguish between spoofed and natural speech? To explore these questions, the preliminary work of this dissertation presents a comparative analysis on the wavelet and MFCC based state-of-the-art spoof de- tection techniques developed in these papers, respectively (Novoselov et al., 2016) (Alam et al., 2016a). Two datasets are used for evaluation. The results on ASVspoof 2015 justify our inclination towards wavelet-based features instead of MFCC fea- tures. The experiments on the ASVspoof 2019 database show the lack of credibility of the traditional handcrafted features and give us more reason to progress towards using end-to-end deep neural networks and more recent techniques. For the subsequent few experiments, we use Sincnet architecture as our baseline. We get E2E deep learning models, which we call WSTnet and CWTnet, respectively, by replacing the Sinc layer with the Wavelet Scattering and Continu- ous wavelet transform layers. The results obtained from the score level fusion of our models: CWTnet and WSnet, with that of Sincnet, are encouraging and point to the fact that spectral diversity at the input feature level is an asset. The fusion model achieved 62% and 17% relative improvement over traditional handcrafted models and our Sincnet baseline when evaluated on the modern spoofing attacks in ASVspoof 2019. The final scale distribution and the number of scales used in CWTnet are far from optimal for the task at hand. Our main motto here was to fine-tune the scale parameter to get an insight into the frequency regions responsible for distinguish- ing spoofed and natural speech. But manual fine-tuning and cross-validation would be very computationally and time expensive. So to solve this problem, we replaced the CWT layer with a Wavelet Deconvolution(WD) (Khan and Yener, 2018) layer in our CWTnet architecture. This layer calculates the Discrete-Continuous Wavelet Transform similar to the CWTnet but also optimizes the scale parameter using back- propagation. WD layer calculates the transform on the input signal during the for- ward pass, gives the resultant features to the further layers of the network. It com- putes the gradients of the loss function with respect to the scale parameters during back-propagation. The WDnet model achieved 26% and 7% relative improvement over CWTnet and Sincnet model respectively when evaluated over ASVspoof 2019 dataset. This shows that more generalized features are extracted as compared to the features extracted by CWTnet as only the most important and relevant frequency regions are focused upon. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Automatic Speaker Verification en_US
dc.subject spoofing system en_US
dc.subject MFCC en_US
dc.subject CWTnet en_US
dc.subject WSTnet en_US
dc.title Detection of doctored speech : towards an end-to-end parametric learn-able filter approach en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account