Using schema.org annotations for training and maintaining product matchers
Peeters, Ralph
;
Primpeli, Anna
;
Wichtlhuber, Benedikt
;
Bizer, Christian
DOI:
|
https://doi.org/10.1145/3405962.3405964
|
URL:
|
https://dl.acm.org/doi/10.1145/3405962.3405964
|
Weitere URL:
|
https://dl.acm.org/doi/proceedings/10.1145/3405962
|
Dokumenttyp:
|
Konferenzveröffentlichung
|
Erscheinungsjahr:
|
2020
|
Buchtitel:
|
WIMS 2020: proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics, Biarritz, France, June 30 - July 3, 2020
|
Seitenbereich:
|
195-204
|
Veranstaltungstitel:
|
WIMS 2020
|
Veranstaltungsort:
|
Online
|
Veranstaltungsdatum:
|
30.06.-03.07.2020
|
Herausgeber:
|
Chbeir, Richard
|
Ort der Veröffentlichung:
|
New York, NY
|
Verlag:
|
ACM
|
ISBN:
|
978-1-4503-7542-9
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
|
Fachgebiet:
|
004 Informatik
|
Freie Schlagwörter (Englisch):
|
e-commerce , schema.org , product matching , semantic web , neural networks
|
Abstract:
|
Product matching is a central task within e-commerce applications such as price comparison portals and online market places. State-of-the-art product matching methods achieve F1 scores above 0.90 using deep learning techniques combined with huge amounts of training data (e.g > 100K pairs of offers). Gathering and maintaining such large training corpora is costly, as it implies labeling pairs of offers as matches or non-matches. Acquiring the ability to be good at product matching thus means a major investment for an e-commerce company. This paper shows that the manual labeling of training data for product matching can be replaced by relying exclusively on schema.org annotations gathered from the public Web. We show that using only schema.org data for training, we are able to achieve F1 scores between 0.92 and 0.95 depending on the product category. As new products appear everyday, it is important that matching models can be maintained with justifiable effort. In order to give practical advice on how to maintain matching models, we compare the performance of deep learning and traditional matching models on unseen products and experiment with different fine-tuning and re-training strategies for model maintenance, again using only schema.org annotations as training data. Finally, as using the public Web as distant supervision carries inherent noise, we evaluate deep learning and traditional matching models with regards to their label-noise resistance and show that deep learning is able to deal with the amounts of identifier-noise found in schema.org annotations.
|
Zusätzliche Informationen:
|
Online-Ressource
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Suche Autoren in
BASE:
Peeters, Ralph
;
Primpeli, Anna
;
Wichtlhuber, Benedikt
;
Bizer, Christian
Google Scholar:
Peeters, Ralph
;
Primpeli, Anna
;
Wichtlhuber, Benedikt
;
Bizer, Christian
ORCID:
Peeters, Ralph ORCID: https://orcid.org/0000-0003-3174-2616, Primpeli, Anna ORCID: https://orcid.org/0000-0002-1783-2482, Wichtlhuber, Benedikt and Bizer, Christian ORCID: https://orcid.org/0000-0003-2367-0237
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|