IIIT-Delhi Institutional Repository

Explainability of black box deep learning models and bias detection

Show simple item record

dc.contributor.author Arora, Kushagr
dc.contributor.author Goyal, Vikram (Advisor)
dc.date.accessioned 2022-03-31T06:07:46Z
dc.date.available 2022-03-31T06:07:46Z
dc.date.issued 2021-05
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/987
dc.description.abstract Explanation of predictions made by black-box deep learning models has been rather challenging, especially when neither the model details nor its training data are known. Various techniques based on shadow-model have been proposed to explain the target black-box model predictions. The quality of a shadow model depends directly on the data set used to train it. Previous work has shown that it is important to replicate the data-view captured by the black-box deep learning model to create effective interpretable shadow models, where a data-view is defined as a set of representative data instances classified correctly by a model. However, the process to use a randomly created dataset as a data-view may not lead to a good shadow model. In this work, we present a mechanism to create good data-view by learning the process of creation of good shadow models vis-à-vis the target model. Our method of data-view synthesis uses query synthesis, wherein we train a binary classifier to distinguish data instances into good and bad classes with respect to the task of explaining the target deep learning model; and subsequently use the good data records to train interpretable models such as Decision Trees and Explainable Neural Networks (xNN). We extensively evaluate our approach on a blackbox model trained on public datasets and show its performance in explanation generation en_US
dc.language.iso en_US en_US
dc.publisher IIIT- Delhi en_US
dc.subject Interpretability en_US
dc.subject Data view extraction en_US
dc.subject Shadow model en_US
dc.subject Data synthesis en_US
dc.title Explainability of black box deep learning models and bias detection en_US
dc.type Other en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account