Fusing time-dependent web table data
Oulabi, Yaser
;
Meusel, Robert
;
Bizer, Christian
DOI:
|
https://doi.org/10.1145/2932194.2932197
|
URL:
|
http://doi.acm.org/10.1145/2932194.2932197
|
Dokumenttyp:
|
Konferenzveröffentlichung
|
Erscheinungsjahr:
|
2016
|
Buchtitel:
|
WebDB '16 : Proceedings of the 19th International Workshop on Web and Databases, San Francisco, CA, USA, June 26, 2016 : co-located with ACM SIGMOD 2016
|
Seitenbereich:
|
Article 3, 1-7
|
Veranstaltungstitel:
|
19th International Workshop on Web and Databases, WebDB '16
|
Veranstaltungsort:
|
San Francisco, CA
|
Veranstaltungsdatum:
|
June 26th, 2016
|
Ort der Veröffentlichung:
|
New York, NY
|
Verlag:
|
ACM
|
ISBN:
|
978-1-4503-4310-7
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
|
Fachgebiet:
|
004 Informatik
|
Freie Schlagwörter (Englisch):
|
Data Fusion , Conflict Resolution , Web Tables , Web Data
|
Abstract:
|
A subset of the HTML tables on the Web contains relational data. The data in these tables covers a multitude of topics and is thus very useful for complementing or validating cross-domain knowledge bases, such as DBpedia, YAGO, or the Google Knowledge Graph. A large fraction of the data in these knowledge bases is time-dependent, meaning that the correctness of an attribute value depends on a point in time. Fusing data from web tables in order to determine correct values for time-dependent attributes is challenging as most web tables do not contain timestamp information. A possibility to deal with this sparsity is to exploit timestamps which appear in different locations on the web page around the table. But as these timestamps might not apply to the web table value in question, this approach introduces noise. This paper investigates the extent to which the performance of data fusion strategies that rely on voting, PageRank, and Knowledge-Based-Trust can be improved by incorporating noisy and sparse timestamp information. For this, we present a machine-learning-based approach which considers different types of noisy timestamps in the data fusion process, and experiment with propagating timestamp information between web tables in order to overcome sparsity. We evaluate the data fusion strategies using a large public corpus of web tables and a public gold standard of time-dependent attribute values. We find that our methods effectively choose and weigh timestamp information per attribute and reduce sparsity using propagation. By incorporating timestamp information into data fusion strategies that previously did not exploit temporal meta information, we are able to increase F1-measure on average by 5%.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Suche Autoren in
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|