Integrating product data using deep learning : Art.-Nr. 11

Peeters, Ralph ; Bizer, Christian

[img] PDF
bwHPC7_11.pdf - Published

Download (349kB)

URN: urn:nbn:de:bsz:180-madoc-635049
Document Type: Conference or workshop publication
Year of publication: 2022
Book title: Proceedings of the 7th bwHPC Symposium
Page range: 59-62
Conference title: 7th bwHPC Symposium: High Performance Computing in Baden-Württemberg
Location of the conference venue: Ulm, Germany, Online
Date of the conference: 08.11.2021
Place of publication: Ulm
Publishing house: Universität Ulm
ISBN: 978-3-948303-29-7
Related URLs:
Publication language: English
Institution: School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
Pre-existing license: Creative Commons Attribution 4.0 International (CC BY 4.0)
Subject: 004 Computer science, internet
Keywords (English): deep learning , product matching , data integration
Abstract: Product matching is the task of deciding whether two product descriptions refer to the same real-world product. Product matching is a central task in e-commerce applications such as online market places and price comparison portals, as these applications need to find out which offers refer to the same product before they can integrate data from the offers or compare product prices. Product matching is a non-trivial task as merchants describe products in different ways and as small differences in the product descriptions matter for distinguishing between different variants of the same product. A successful approach for dealing with the heterogeneity of product offers is to combine deep learning-based matching techniques with large amounts of training data which can be extracted from Web corpora such as the Common Crawl. Training deep learning methods involving millions of parameters for use cases such as product matching requires access to large compute resources. In this extended abstract, we report how we trained different RNN- and BERT-based models for product matching using the bwHPC infrastructure and how this extended training allowed us to reach peak performance. Afterwards, we describe how we use the bwHPC infrastructure for our ongoing research on table representation learning for data integration.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.

Metadata export


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item