Synthesizing N-ary relations from web tables


Lehmberg, Oliver ; Bizer, Christian



DOI: https://doi.org/10.1145/3326467.3326480
URL: https://dl.acm.org/citation.cfm?doid=3326467.33264...
Dokumenttyp: Konferenzveröffentlichung
Erscheinungsjahr: 2019
Buchtitel: WIMS2019 : Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, Seoul, Republic of Korea, June 26 - 28, 2019
Seitenbereich: 17:1-17:12
Veranstaltungstitel: WIMS 2019
Veranstaltungsort: Seoul, Republic of Korea
Veranstaltungsdatum: June 26-28, 2019
Herausgeber: Akerkar, Rajendra
Ort der Veröffentlichung: New York, NY, USA
Verlag: ACM
ISBN: 978-1-4503-6190-3
Sprache der Veröffentlichung: Englisch
Einrichtung: Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
Fachgebiet: 004 Informatik
Freie Schlagwörter (Deutsch): Schema Extension , Schema Matching , Web Tables
Abstract: The Web contains a large number of relational HTML tables, which cover a multitude of different, often very specific topics. This rich pool of data has motivated a growing body of research on methods that use web table data to extend local tables with additional attributes or add missing facts to knowledge bases. Nearly all existing approaches for these tasks are limited to the extraction of binary relations from web tables, e.g. an unemployment number may only depend on the state. Inspecting randomly chosen tables on the Web quickly reveals that many relations in the tables are non-binary, e.g. unemployment numbers also depend on the point in time and the profession. Treating such n-ary relations as binary leads to data that cannot be interpreted correctly. The extraction of n-ary relations from web tables is complicated by two factors: 1. important attributes might be stated outside of the table; 2. relational web tables are usually too small for functional dependency discovery. This paper presents a method to synthesize n-ary relations from web tables for the use case of knowledge base extension. The method exploits information from the page around the table and stitches (combines) multiple tables from the same website. We apply the method to a corpus of 5 million web tables originating from 80 thousand different web sites and find that 38% of the synthesized relations are non-binary. We find different relations for the same dependent attribute, e.g. relations providing unemployment numbers based on time, location, or profession. By identifying groups of websites which provide these relations, we lay the foundation for applications in knowledge base augmentation and data search, which allow for a specific selection of relations that determine an attribute according to the applications' data requirements.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadaten-Export


Zitation


+ Suche Autoren in

+ Aufruf-Statistik

Aufrufe im letzten Jahr

Detaillierte Angaben



Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail


Actions (login required)

Eintrag anzeigen Eintrag anzeigen