Discriminative framework for single channel audio source separation

Gang, Arpita; Biyani, Pravesh (Advisor)

Home
→
Electronics and Communication Engineering
→
MTech Theses
→
Year-2016
→
View Item

Discriminative framework for single channel audio source separation

Gang, Arpita; Biyani, Pravesh (Advisor)

URI: https://repository.iiitd.edu.in/jspui/handle/123456789/410

Date: 2016-09-13

Abstract:

Sound sources are a very common everyday occurrence. But a single audio source is seldom heard alone. There is a sea of applications, like speech recognition, where an isolated sound source is desirable. This makes audio source separation a very important problem. In this thesis, we focus on the single channel source separation (SCSS) problem, which implies separation of individual sources from a single observation. The problem of finding many unknowns from one equation makes this problem ill-posed forcing the use of some prior information for better separation. Model-based methods for single channel source separation use prior information in the form of learned bases. In case of similar signals like speech, models will be highly overlapping, thus making separation difficult. Thus, the sources should be modeled using proper bases/structure for an effective separation. Along with the model, the parameters of the model also play a vital role in quantifying the quality of separation. In any model, a higher dimension (number of columns) makes it a good _t for the source. But for similar sources, it also makes a good _t for the other source. Thus, dimension of the models are an important factor in deciding the discrimination provided by the models and hence the quality of separation. Also, separating one source at a time from the mixture extricates the problem from balancing the reconstruction of all the sources thus, improving the separation performance. In this thesis, we propose a novel discriminative learning framework for source separation of audio signals when observed from a single mixture. The framework is generic where we separate one source at a time and embed our dimension search algorithm in the training of discriminative source models. We apply our framework on the NMF based SCSS algorithm. We also propose an alternative structure using dictionary and subspace together for learning source models. We demonstrate a performance improvement in separation for both speech-speech and speech-music mixture.

Show full item record