Explaining neural networks without access to training data

Marton, Sascha ; Lüdtke, Stefan ; Bartelt, Christian ; Tschalzev, Andrej ; Stuckenschmidt, Heiner

PDF
s10994-023-06428-4-1.pdf - Veröffentlichte Version
Download (2MB)

DOI:	https://doi.org/10.1007/s10994-023-06428-4
URL:	https://link.springer.com/article/10.1007/s10994-0...
Weitere URL:	https://arxiv.org/abs/2206.04891
URN:	urn:nbn:de:bsz:180-madoc-663926
Dokumenttyp:	Zeitschriftenartikel
Erscheinungsjahr:	2024
Titel einer Zeitschrift oder einer Reihe:	Machine Learning
Band/Volume:	113
Heft/Issue:	6
Seitenbereich:	3633-3652
Ort der Veröffentlichung:	Dordrecht [u.a.]
Verlag:	Springer
ISSN:	0885-6125 , 1573-0565
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Practical Computer Science II: Artificial Intelligence (Stuckenschmidt 2009-) Außerfakultäre Einrichtungen > Institut für Enterprise Systems (InES)
Bereits vorhandene Lizenz:	Creative Commons Namensnennung 4.0 International (CC BY 4.0)
Fachgebiet:	004 Informatik
Freie Schlagwörter (Englisch):	explainable Artificial Intelligence (xAI) , machine learning , Artificial Intelligence , decision trees
Abstract:	We consider generating explanations for neural networks in cases where the network's training data is not accessible, for instance due to privacy or safety issues. Recently, Interpretation Nets (I-Nets) have been proposed as a sample-free approach to post-hoc, global model interpretability that does not require access to training data. They formulate interpretation as a machine learning task that maps network representations (parameters) to a representation of an interpretable function. In this paper, we extend the I-Net framework to the cases of standard and soft decision trees as surrogate models. We propose a suitable decision tree representation and design of the corresponding I-Net output layers. Furthermore, we make I-Nets applicable to real-world tasks by considering more realistic distributions when generating the I-Net's training data. We empirically evaluate our approach against traditional global, post-hoc interpretability approaches and show that it achieves superior results when the training data is not accessible.