Integrating product data using deep learning : Art.-Nr. 11

PDF
bwHPC7_11.pdf - Veröffentlichte Version
Download (349kB)

DOI:	https://doi.org/10.18725/OPARU-46067
URL:	https://oparu.uni-ulm.de/xmlui/handle/123456789/46...
URN:	urn:nbn:de:bsz:180-madoc-635049
Dokumenttyp:	Konferenzveröffentlichung
Erscheinungsjahr:	2022
Buchtitel:	Proceedings of the 7th bwHPC Symposium
Seitenbereich:	59-62
Veranstaltungstitel:	7th bwHPC Symposium: High Performance Computing in Baden-Württemberg
Veranstaltungsort:	Ulm, Germany, Online
Veranstaltungsdatum:	08.11.2021
Ort der Veröffentlichung:	Ulm
Verlag:	Universität Ulm
ISBN:	978-3-948303-29-7
Verwandte URLs:	https://indico.scc.kit.edu/event/2399/ov...
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
Bereits vorhandene Lizenz:	Creative Commons Namensnennung 4.0 International (CC BY 4.0)
Fachgebiet:	004 Informatik
Freie Schlagwörter (Englisch):	deep learning , product matching , data integration
Abstract:	Product matching is the task of deciding whether two product descriptions refer to the same real-world product. Product matching is a central task in e-commerce applications such as online market places and price comparison portals, as these applications need to find out which offers refer to the same product before they can integrate data from the offers or compare product prices. Product matching is a non-trivial task as merchants describe products in different ways and as small differences in the product descriptions matter for distinguishing between different variants of the same product. A successful approach for dealing with the heterogeneity of product offers is to combine deep learning-based matching techniques with large amounts of training data which can be extracted from Web corpora such as the Common Crawl. Training deep learning methods involving millions of parameters for use cases such as product matching requires access to large compute resources. In this extended abstract, we report how we trained different RNN- and BERT-based models for product matching using the bwHPC infrastructure and how this extended training allowed us to reach peak performance. Afterwards, we describe how we use the bwHPC infrastructure for our ongoing research on table representation learning for data integration.