Abstract:
Explanation of predictions made by black-box deep learning models has been rather challenging,
especially when neither the model details nor its training data are known. Various techniques based
on shadow-model have been proposed to explain the target black-box model predictions. The quality
of a shadow model depends directly on the data set used to train it. Previous work has shown that
it is important to replicate the data-view captured by the black-box deep learning model to create
effective interpretable shadow models, where a data-view is defined as a set of representative data
instances classified correctly by a model. However, the process to use a randomly created dataset
as a data-view may not lead to a good shadow model. In this work, we present a mechanism to
create good data-view by learning the process of creation of good shadow models vis-à-vis the target
model. Our method of data-view synthesis uses query synthesis, wherein we train a binary classifier to
distinguish data instances into good and bad classes with respect to the task of explaining the target
deep learning model; and subsequently use the good data records to train interpretable models such
as Decision Trees and Explainable Neural Networks (xNN). We extensively evaluate our approach on
a blackbox model trained on public datasets and show its performance in explanation generation