Explainability of black box deep learning models and bias detection

Arora, Kushagr; Goyal, Vikram (Advisor)

Explainability of black box deep learning models and bias detection

Arora, Kushagr; Goyal, Vikram (Advisor)

URI: http://repository.iiitd.edu.in/xmlui/handle/123456789/987

Date: 2021-05

Abstract:

Explanation of predictions made by black-box deep learning models has been rather challenging, especially when neither the model details nor its training data are known. Various techniques based on shadow-model have been proposed to explain the target black-box model predictions. The quality of a shadow model depends directly on the data set used to train it. Previous work has shown that it is important to replicate the data-view captured by the black-box deep learning model to create effective interpretable shadow models, where a data-view is defined as a set of representative data instances classified correctly by a model. However, the process to use a randomly created dataset as a data-view may not lead to a good shadow model. In this work, we present a mechanism to create good data-view by learning the process of creation of good shadow models vis-à-vis the target model. Our method of data-view synthesis uses query synthesis, wherein we train a binary classifier to distinguish data instances into good and bad classes with respect to the task of explaining the target deep learning model; and subsequently use the good data records to train interpretable models such as Decision Trees and Explainable Neural Networks (xNN). We extensively evaluate our approach on a blackbox model trained on public datasets and show its performance in explanation generation

Show full item record