In this paper, we aim to explain the decisions of neural networks by utilizing multimodal information. That is counter-intuitive attributes and counter visual examples which appear when perturbed samples are introduced. Different from previous work …