Entity extraction from Wikipedia list pages
Heist, Nicolas
;
Paulheim, Heiko
DOI:
|
https://doi.org/10.1007/978-3-030-49461-2_19
|
URL:
|
https://link.springer.com/chapter/10.1007%2F978-3-...
|
Additional URL:
|
https://arxiv.org/abs/2003.05146
|
Document Type:
|
Conference or workshop publication
|
Year of publication:
|
2020
|
Book title:
|
The Semantic Web : 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31-June 4, 2020, Proceedings
|
The title of a journal, publication series:
|
Lecture Notes in Computer Science
|
Volume:
|
12123
|
Page range:
|
327-342
|
Conference title:
|
ESWC 2020
|
Location of the conference venue:
|
Online
|
Date of the conference:
|
31.05.-04.06.2020
|
Publisher:
|
Harth, Andreas
|
Place of publication:
|
Berlin [u.a.]
|
Publishing house:
|
Springer
|
ISBN:
|
978-3-030-49460-5 , 978-3-030-49462-9 , 978-3-030-49461-2
|
ISSN:
|
0302-9743 , 1611-3349
|
Publication language:
|
English
|
Institution:
|
School of Business Informatics and Mathematics > Data Science (Paulheim 2018-)
|
Subject:
|
004 Computer science, internet
|
Abstract:
|
When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-based knowledge graphs are far from complete. Especially, as Wikipedia’s policies permit pages about subjects only if they have a certain popularity, such graphs tend to lack information about less well-known entities. Information about these entities is oftentimes available in the encyclopedia, but not represented as an individual page. In this paper, we present a two-phased approach for the extraction of entities from Wikipedia’s list pages, which have proven to serve as a valuable source of information. In the first phase, we build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages that we use in the second phase to train a classification model. With this approach we extract over 700k new entities and extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Search Authors in
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
|
Show item |
|
|