Using ChatGPT for Entity Matching
Peeters, Ralph
;
Bizer, Christian
DOI:
|
https://doi.org/10.1007/978-3-031-42941-5_20
|
URL:
|
https://link.springer.com/chapter/10.1007/978-3-03...
|
Additional URL:
|
https://www.researchgate.net/publication/370594448...
|
Document Type:
|
Conference or workshop publication
|
Year of publication:
|
2023
|
Book title:
|
New Trends in Database and Information Systems : ADBIS 2023 short papers, doctoral consortium and workshops: AIDMA, DOING, K-Gals, MADEISD, PeRS, Barcelona, Spain, September 4-7, 2023, Proceedings
|
The title of a journal, publication series:
|
Communications in Computer and Information Science
|
Volume:
|
1850
|
Page range:
|
221-230
|
Conference title:
|
ADBIS 2023
|
Location of the conference venue:
|
Barcelona, Spain
|
Date of the conference:
|
04.-07.09.2023
|
Publisher:
|
Abelló, Alberto
;
Bugiotti, Francesca
;
Gamper, Jahnn
;
Romero, Oscar
;
Vargas Solar, Genoveva
;
Vassiliadis, Panos
;
Wrembel, Robert
;
Zumpano, Ester
|
Place of publication:
|
Cham
|
Publishing house:
|
Springer
|
ISBN:
|
978-3-031-42940-8
|
ISSN:
|
1865-0929 , 1865-0937 , 978-3-031-42941-5
|
Publication language:
|
English
|
Institution:
|
School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
|
Subject:
|
004 Computer science, internet
|
Keywords (English):
|
Entity Matching , Large Language Models , ChatGPT
|
Abstract:
|
Entity Matching is the task of deciding if two entity descriptions refer to the same real-world entity. State-of-the-art entity matching methods often rely on fine-tuning Transformer models such as BERT or RoBERTa. Two major drawbacks of using these models for entity matching are that (i) the models require significant amounts of fine-tuning data for reaching a good performance and (ii) the fine-tuned models are not robust concerning out-of-distribution entities. In this paper, we investigate using ChatGPT for entity matching as a more robust, training data-efficient alternative to traditional Transformer models. We perform experiments along three dimensions: (i) general prompt design, (ii) in-context learning, and (iii) provision of higher-level matching knowledge. We show that ChatGPT is competitive with a fine-tuned RoBERTa model, reaching a zero-shot performance of 82.35% F1 on a challenging matching task on which RoBERTa requires 2000 training examples for reaching a similar performance. Adding in-context demonstrations to the prompts further improves the F1 by up to 7.85% when using similarity-based example selection. Always using the same set of 10 handpicked demonstrations leads to an improvement of 4.92% over the zero-shot performance. Finally, we show that ChatGPT can also be guided by adding higher-level matching knowledge in the form of rules to the prompts. Providing matching rules leads to similar performance gains as providing in-context demonstrations.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Search Authors in
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
|
Show item |
|