Explaining neural networks without access to training data


Marton, Sascha ; Lüdtke, Stefan ; Bartelt, Christian ; Tschalzev, Andrej ; Stuckenschmidt, Heiner


[img] PDF
s10994-023-06428-4-1.pdf - Published

Download (2MB)

DOI: https://doi.org/10.1007/s10994-023-06428-4
URL: https://link.springer.com/article/10.1007/s10994-0...
Additional URL: https://arxiv.org/abs/2206.04891
URN: urn:nbn:de:bsz:180-madoc-663926
Document Type: Article
Year of publication: 2024
The title of a journal, publication series: Machine Learning
Volume: 113
Issue number: 6
Page range: 3633-3652
Place of publication: Dordrecht [u.a.]
Publishing house: Springer
ISSN: 0885-6125 , 1573-0565
Publication language: English
Institution: School of Business Informatics and Mathematics > Practical Computer Science II: Artificial Intelligence (Stuckenschmidt 2009-)
Außerfakultäre Einrichtungen > Institut für Enterprise Systems (InES)
Pre-existing license: Creative Commons Attribution 4.0 International (CC BY 4.0)
Subject: 004 Computer science, internet
Keywords (English): explainable Artificial Intelligence (xAI) , machine learning , Artificial Intelligence , decision trees
Abstract: We consider generating explanations for neural networks in cases where the network's training data is not accessible, for instance due to privacy or safety issues. Recently, Interpretation Nets (I-Nets) have been proposed as a sample-free approach to post-hoc, global model interpretability that does not require access to training data. They formulate interpretation as a machine learning task that maps network representations (parameters) to a representation of an interpretable function. In this paper, we extend the I-Net framework to the cases of standard and soft decision trees as surrogate models. We propose a suitable decision tree representation and design of the corresponding I-Net output layers. Furthermore, we make I-Nets applicable to real-world tasks by considering more realistic distributions when generating the I-Net's training data. We empirically evaluate our approach against traditional global, post-hoc interpretability approaches and show that it achieves superior results when the training data is not accessible.




Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item