Fusing time-dependent web table data


Oulabi, Yaser ; Meusel, Robert ; Bizer, Christian



DOI: https://doi.org/10.1145/2932194.2932197
URL: http://doi.acm.org/10.1145/2932194.2932197
Dokumenttyp: Konferenzveröffentlichung
Erscheinungsjahr: 2016
Buchtitel: WebDB '16 : Proceedings of the 19th International Workshop on Web and Databases, San Francisco, CA, USA, June 26, 2016 : co-located with ACM SIGMOD 2016
Seitenbereich: Article 3, 1-7
Veranstaltungstitel: 19th International Workshop on Web and Databases, WebDB '16
Veranstaltungsort: San Francisco, CA
Veranstaltungsdatum: June 26th, 2016
Ort der Veröffentlichung: New York, NY
Verlag: ACM
ISBN: 978-1-4503-4310-7
Sprache der Veröffentlichung: Englisch
Einrichtung: Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
Fachgebiet: 004 Informatik
Freie Schlagwörter (Englisch): Data Fusion , Conflict Resolution , Web Tables , Web Data
Abstract: A subset of the HTML tables on the Web contains relational data. The data in these tables covers a multitude of topics and is thus very useful for complementing or validating cross-domain knowledge bases, such as DBpedia, YAGO, or the Google Knowledge Graph. A large fraction of the data in these knowledge bases is time-dependent, meaning that the correctness of an attribute value depends on a point in time. Fusing data from web tables in order to determine correct values for time-dependent attributes is challenging as most web tables do not contain timestamp information. A possibility to deal with this sparsity is to exploit timestamps which appear in different locations on the web page around the table. But as these timestamps might not apply to the web table value in question, this approach introduces noise. This paper investigates the extent to which the performance of data fusion strategies that rely on voting, PageRank, and Knowledge-Based-Trust can be improved by incorporating noisy and sparse timestamp information. For this, we present a machine-learning-based approach which considers different types of noisy timestamps in the data fusion process, and experiment with propagating timestamp information between web tables in order to overcome sparsity. We evaluate the data fusion strategies using a large public corpus of web tables and a public gold standard of time-dependent attribute values. We find that our methods effectively choose and weigh timestamp information per attribute and reduce sparsity using propagation. By incorporating timestamp information into data fusion strategies that previously did not exploit temporal meta information, we are able to increase F1-measure on average by 5%.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadaten-Export


Zitation


+ Suche Autoren in

BASE: Oulabi, Yaser ; Meusel, Robert ; Bizer, Christian

Google Scholar: Oulabi, Yaser ; Meusel, Robert ; Bizer, Christian

ORCID: Oulabi, Yaser ; Meusel, Robert ; Bizer, Christian ORCID: 0000-0003-2367-0237

+ Aufruf-Statistik

Aufrufe im letzten Jahr

Detaillierte Angaben



Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail


Actions (login required)

Eintrag anzeigen Eintrag anzeigen