OPIEC: An open information extraction corpus
Gashteovski, Kiril
;
Wanner, Sebastian
;
Hertling, Sven
;
Broscheit, Samuel
;
Gemulla, Rainer
Preview |
|
PDF
OPIEC An Open Information Extraction Corpus.pdf
- Published
Download (1MB)
|
URL:
|
https://madoc.bib.uni-mannheim.de/48226
|
Additional URL:
|
https://openreview.net/forum?id=HJxeGb5pTm
|
URN:
|
urn:nbn:de:bsz:180-madoc-482261
|
Document Type:
|
Conference or workshop publication
|
Year of publication:
|
2019
|
Book title:
|
AKBC 2019 : 1st Conference on Automated Knowledge Base Construction (AKBC) : May 20-22, 2019, Monday-Wednesday, Amherst, MA
|
Page range:
|
1-19
|
Conference title:
|
AKBC 2019
|
Location of the conference venue:
|
Amherst, MA
|
Date of the conference:
|
May 20-22, 2019
|
Place of publication:
|
Amherst, MA
|
Publishing house:
|
OpenReview.net
|
Publication language:
|
English
|
Institution:
|
School of Business Informatics and Mathematics > Practical Computer Science I: Data Analytics (Gemulla 2014-)
|
License:
|
Creative Commons Attribution 4.0 International (CC BY 4.0)
|
Subject:
|
004 Computer science, internet
|
Abstract:
|
Open information extraction (OIE) systems extract relations and their arguments from natural language text in an unsupervised manner. The resulting extractions are a valuable resource for downstream tasks such as knowledge base construction, open question answering, or event schema induction. In this paper, we release, describe, and analyze an OIE corpus called OPIEC, which was extracted from the text of English Wikipedia. OPIEC complements the available OIE resources: It is the largest OIE corpus publicly available to date (over 340M triples) and contains valuable metadata such as provenance information, confidence scores, linguistic annotations, and semantic annotations including spatial and temporal information. We analyze the OPIEC corpus by comparing its content with knowledge bases such as DBpedia or YAGO, which are also based on Wikipedia. We found that most of the facts between entities present in OPIEC cannot be found in DBpedia and/or YAGO, that OIE facts often differ in the level of specificity compared to knowledge base facts, and that OIE open relations are generally highly polysemous. We believe that the OPIEC corpus is a valuable resource for future research on automated knowledge base construction.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
| Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt. |
Search Authors in
BASE:
Gashteovski, Kiril
;
Wanner, Sebastian
;
Hertling, Sven
;
Broscheit, Samuel
;
Gemulla, Rainer
Google Scholar:
Gashteovski, Kiril
;
Wanner, Sebastian
;
Hertling, Sven
;
Broscheit, Samuel
;
Gemulla, Rainer
ORCID:
Gashteovski, Kiril, Wanner, Sebastian, Hertling, Sven ORCID: https://orcid.org/0000-0003-0333-5888, Broscheit, Samuel and Gemulla, Rainer ORCID: https://orcid.org/0000-0003-2762-0050
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
|
Show item |
|