Explainability of black box deep learning models and bias detection

Arora, Kushagr; Goyal, Vikram (Advisor)

dc.contributor.author	Arora, Kushagr
dc.contributor.author	Goyal, Vikram (Advisor)
dc.date.accessioned	2022-03-31T06:07:46Z
dc.date.available	2022-03-31T06:07:46Z
dc.date.issued	2021-05
dc.identifier.uri	http://repository.iiitd.edu.in/xmlui/handle/123456789/987
dc.description.abstract	Explanation of predictions made by black-box deep learning models has been rather challenging, especially when neither the model details nor its training data are known. Various techniques based on shadow-model have been proposed to explain the target black-box model predictions. The quality of a shadow model depends directly on the data set used to train it. Previous work has shown that it is important to replicate the data-view captured by the black-box deep learning model to create effective interpretable shadow models, where a data-view is defined as a set of representative data instances classified correctly by a model. However, the process to use a randomly created dataset as a data-view may not lead to a good shadow model. In this work, we present a mechanism to create good data-view by learning the process of creation of good shadow models vis-à-vis the target model. Our method of data-view synthesis uses query synthesis, wherein we train a binary classifier to distinguish data instances into good and bad classes with respect to the task of explaining the target deep learning model; and subsequently use the good data records to train interpretable models such as Decision Trees and Explainable Neural Networks (xNN). We extensively evaluate our approach on a blackbox model trained on public datasets and show its performance in explanation generation	en_US
dc.language.iso	en_US	en_US
dc.publisher	IIIT- Delhi	en_US
dc.subject	Interpretability	en_US
dc.subject	Data view extraction	en_US
dc.subject	Shadow model	en_US
dc.subject	Data synthesis	en_US
dc.title	Explainability of black box deep learning models and bias detection	en_US
dc.type	Other	en_US