Abstract:
Widely adapted locally interpretable methods such as LIME [20] and SHAP [12] fail to capture the underlying causal relationships between the variables. They merely capture the linear and non-linear associations. These techniques assume the features to be independent, thereby precluding the concepts of moderation, confounding, and causation. In this work, Directed Acyclic Graphs (DAGs) are proposed as a novel method to obtain locally interpretable, model agnostic explanations to interpret individual predictions of a model. The LIME [20] framework is extended to DAG-LIME. DAG-LIME proposes an active learning approach to learning DAGs, by leveraging the DAG NO TEARS [29] algorithm. By learning inter-variable causal relationships through DAGs, the aim is to provide causal interpretability rather than weighed associations for the instance of interest.