Using ChatGPT for Entity Matching

Peeters, Ralph ; Bizer, Christian

Additional URL:
Document Type: Conference or workshop publication
Year of publication: 2023
Book title: New Trends in Database and Information Systems : ADBIS 2023 short papers, doctoral consortium and workshops: AIDMA, DOING, K-Gals, MADEISD, PeRS, Barcelona, Spain, September 4-7, 2023, Proceedings
The title of a journal, publication series: Communications in Computer and Information Science
Volume: 1850
Page range: 221-230
Conference title: ADBIS 2023
Location of the conference venue: Barcelona, Spain
Date of the conference: 04.-07.09.2023
Publisher: Abelló, Alberto ; Bugiotti, Francesca ; Gamper, Jahnn ; Romero, Oscar ; Vargas Solar, Genoveva ; Vassiliadis, Panos ; Wrembel, Robert ; Zumpano, Ester
Place of publication: Cham
Publishing house: Springer
ISBN: 978-3-031-42940-8
ISSN: 1865-0929 , 1865-0937 , 978-3-031-42941-5
Publication language: English
Institution: School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
Subject: 004 Computer science, internet
Keywords (English): Entity Matching , Large Language Models , ChatGPT
Abstract: Entity Matching is the task of deciding if two entity descriptions refer to the same real-world entity. State-of-the-art entity matching methods often rely on fine-tuning Transformer models such as BERT or RoBERTa. Two major drawbacks of using these models for entity matching are that (i) the models require significant amounts of fine-tuning data for reaching a good performance and (ii) the fine-tuned models are not robust concerning out-of-distribution entities. In this paper, we investigate using ChatGPT for entity matching as a more robust, training data-efficient alternative to traditional Transformer models. We perform experiments along three dimensions: (i) general prompt design, (ii) in-context learning, and (iii) provision of higher-level matching knowledge. We show that ChatGPT is competitive with a fine-tuned RoBERTa model, reaching a zero-shot performance of 82.35% F1 on a challenging matching task on which RoBERTa requires 2000 training examples for reaching a similar performance. Adding in-context demonstrations to the prompts further improves the F1 by up to 7.85% when using similarity-based example selection. Always using the same set of 10 handpicked demonstrations leads to an improvement of 4.92% over the zero-shot performance. Finally, we show that ChatGPT can also be guided by adding higher-level matching knowledge in the form of rules to the prompts. Providing matching rules leads to similar performance gains as providing in-context demonstrations.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Metadata export


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item