Knowledge graph embeddings: link prediction and beyond

Daniel, Ruffinelli

Vorschau

PDF
Dissertation_Daniel_Ruffinelli.pdf - Veröffentlichte Version
Download (1MB)

URN:	urn:nbn:de:bsz:180-madoc-660210
Dokumenttyp:	Dissertation
Erscheinungsjahr:	2023
Ort der Veröffentlichung:	Mannheim
Hochschule:	Universität Mannheim
Gutachter:	Gemulla, Rainer
Datum der mündl. Prüfung:	22 November 2023
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Practical Computer Science I: Data Analytics (Gemulla 2014-)
Lizenz:	Creative Commons Namensnennung 4.0 International (CC BY 4.0)
Fachgebiet:	004 Informatik
Freie Schlagwörter (Englisch):	knowledge graphs , representation learning , machine learning
Abstract:	Knowledge graph embeddings, or KGEs, are models that learn vector representations of knowledge graphs. These representations have been used for tasks such as predicting missing links in the graph, or as pre-trained representations that encode structured data for downstream applications, such as question answering or recommender systems. Despite the large amount of models developed for this purpose, the variety in experimental settings has made it difficult to compare results across different studies. Models are often learned using different training and hyperparameter optimization strategies. In addition, most of the literature has focused on a specific form of predicting missing links, known as link prediction. Almost no attention was given to predicting other types of structures in a knowledge graph, and despite their use in downstream applications, there are virtually no studies on the usability of KGE models as pre-trained representations of knowledge graphs. In this thesis, we propose new training and evaluation methods and conduct several large scale empirical studies, all aimed at studying KGE models as a form of knowledge representation. First, we compare model performance in a fair experimental setting that allows us to separate between contributions from new models and those from new training strategies. We find that differences in training approaches, and not necessarily in model architectures, may account for much of the previously reported progress in link prediction. Second, we study some potential limitations that may result from focusing almost exclusively on the link prediction task for KGE research. We find that good link prediction models are not necessarily able to successfully predict missing links in a knowledge graph, and that link prediction performance is not an indication that models generally capture information in the graph. This contradicts the common argument that KGE models are able to generally preserve the structure in a knowledge graph. Finally, we look beyond the link prediction task and study different training objectives aimed at capturing more information in the graph, and the impact that the resulting representations have on downstream applications. We find that models trained with the standard approach based on link prediction do not capture as much information about the graph as possible, and that link prediction performance is also not a good indicator for good downstream performance. These results suggest that the relation between pre-training objectives and downstream performance is not as clear as suggested in the literature, and that more research is needed to better understand how to learn generally useful representations of knowledge graphs.