Entity extraction from Wikipedia list pages
Heist, Nicolas
;
Paulheim, Heiko
DOI:
|
https://doi.org/10.1007/978-3-030-49461-2_19
|
URL:
|
https://link.springer.com/chapter/10.1007%2F978-3-...
|
Weitere URL:
|
https://arxiv.org/abs/2003.05146
|
Dokumenttyp:
|
Konferenzveröffentlichung
|
Erscheinungsjahr:
|
2020
|
Buchtitel:
|
The Semantic Web : 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31-June 4, 2020, Proceedings
|
Titel einer Zeitschrift oder einer Reihe:
|
Lecture Notes in Computer Science
|
Band/Volume:
|
12123
|
Seitenbereich:
|
327-342
|
Veranstaltungstitel:
|
ESWC 2020
|
Veranstaltungsort:
|
Online
|
Veranstaltungsdatum:
|
31.05.-04.06.2020
|
Herausgeber:
|
Harth, Andreas
|
Ort der Veröffentlichung:
|
Berlin [u.a.]
|
Verlag:
|
Springer
|
ISBN:
|
978-3-030-49460-5 , 978-3-030-49462-9 , 978-3-030-49461-2
|
ISSN:
|
0302-9743 , 1611-3349
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Data Science (Paulheim 2018-)
|
Fachgebiet:
|
004 Informatik
|
Abstract:
|
When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-based knowledge graphs are far from complete. Especially, as Wikipedia’s policies permit pages about subjects only if they have a certain popularity, such graphs tend to lack information about less well-known entities. Information about these entities is oftentimes available in the encyclopedia, but not represented as an individual page. In this paper, we present a two-phased approach for the extraction of entities from Wikipedia’s list pages, which have proven to serve as a valuable source of information. In the first phase, we build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages that we use in the second phase to train a classification model. With this approach we extract over 700k new entities and extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Suche Autoren in
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|