Language-agnostic relation extraction from abstracts in Wikis


Heist, Nicolas ; Hertling, Sven ; Paulheim, Heiko


DOI: https://doi.org/10.3390/info9040075
URL: http://www.mdpi.com/2078-2489/9/4/75
Document Type: Article
Year of publication: 2018
The title of a journal, publication series: Information
Volume: 9
Issue number: 4
Page range: 75
Place of publication: Basel
Publishing house: MDPI Publ.
ISSN: 2078-2489
Publication language: English
Institution: School of Business Informatics and Mathematics > Web Data Mining (Paulheim 2018-)
Subject: 004 Computer science, internet
Abstract: Large-scale knowledge graphs, such as DBpedia, Wikidata, or YAGO, can be enhanced by relation extraction from text, using the data in the knowledge graph as training data, i.e., using distant supervision. While most existing approaches use language-specific methods (usually for English), we present a language-agnostic approach that exploits background knowledge from the graph instead of language-specific techniques and builds machine learning models only from language-independent features. We demonstrate the extraction of relations from Wikipedia abstracts, using the twelve largest language editions of Wikipedia. From those, we can extract 1.6 M new relations in DBpedia at a level of precision of 95%, using a RandomForest classifier trained only on language-independent features. We furthermore investigate the similarity of models for different languages and show an exemplary geographical breakdown of the information extracted. In a second series of experiments, we show how the approach can be transferred to DBkWik, a knowledge graph extracted from thousands of Wikis. We discuss the challenges and first results of extracting relations from a larger set of Wikis, using a less formalized knowledge graph.
Additional information: Online-Ressource

Dieser Eintrag ist Teil der Universitätsbibliographie.




+ Citation Example and Export

Heist, Nicolas ORCID: 0000-0002-4354-9138 ; Hertling, Sven ORCID: 0000-0003-0333-5888 ; Paulheim, Heiko ORCID: 0000-0003-4386-8195 (2018) Language-agnostic relation extraction from abstracts in Wikis. Information Basel 9 4 75 [Article]


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item