Entity extraction from Wikipedia list pages


Heist, Nicolas ; Paulheim, Heiko



DOI: https://doi.org/10.1007/978-3-030-49461-2_19
URL: https://link.springer.com/chapter/10.1007%2F978-3-...
Additional URL: https://arxiv.org/abs/2003.05146
Document Type: Conference or workshop publication
Year of publication: 2020
Book title: The Semantic Web : 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31-June 4, 2020, Proceedings
The title of a journal, publication series: Lecture Notes in Computer Science
Volume: 12123
Page range: 327-342
Conference title: ESWC 2020
Location of the conference venue: Online
Date of the conference: 31.05.-04.06.2020
Publisher: Harth, Andreas
Place of publication: Berlin [u.a.]
Publishing house: Springer
ISBN: 978-3-030-49460-5 , 978-3-030-49462-9 , 978-3-030-49461-2
ISSN: 0302-9743 , 1611-3349
Publication language: English
Institution: School of Business Informatics and Mathematics > Data Science (Paulheim 2018-)
Subject: 004 Computer science, internet
Abstract: When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-based knowledge graphs are far from complete. Especially, as Wikipedia’s policies permit pages about subjects only if they have a certain popularity, such graphs tend to lack information about less well-known entities. Information about these entities is oftentimes available in the encyclopedia, but not represented as an individual page. In this paper, we present a two-phased approach for the extraction of entities from Wikipedia’s list pages, which have proven to serve as a valuable source of information. In the first phase, we build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages that we use in the second phase to train a classification model. With this approach we extract over 700k new entities and extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadata export


Citation


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item