Using ChatGPT for Entity Matching

DOI:	https://doi.org/10.1007/978-3-031-42941-5_20
URL:	https://link.springer.com/chapter/10.1007/978-3-03...
Weitere URL:	https://www.researchgate.net/publication/370594448...
Dokumenttyp:	Konferenzveröffentlichung
Erscheinungsjahr:	2023
Buchtitel:	New Trends in Database and Information Systems : ADBIS 2023 short papers, doctoral consortium and workshops: AIDMA, DOING, K-Gals, MADEISD, PeRS, Barcelona, Spain, September 4-7, 2023, Proceedings
Titel einer Zeitschrift oder einer Reihe:	Communications in Computer and Information Science
Band/Volume:	1850
Seitenbereich:	221-230
Veranstaltungstitel:	ADBIS 2023
Veranstaltungsort:	Barcelona, Spain
Veranstaltungsdatum:	04.-07.09.2023
Herausgeber:	Abelló, Alberto ; Bugiotti, Francesca ; Gamper, Jahnn ; Romero, Oscar ; Vargas Solar, Genoveva ; Vassiliadis, Panos ; Wrembel, Robert ; Zumpano, Ester
Ort der Veröffentlichung:	Cham
Verlag:	Springer
ISBN:	978-3-031-42940-8
ISSN:	1865-0929 , 1865-0937 , 978-3-031-42941-5
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-)
Fachgebiet:	004 Informatik
Freie Schlagwörter (Englisch):	Entity Matching , Large Language Models , ChatGPT
Abstract:	Entity Matching is the task of deciding if two entity descriptions refer to the same real-world entity. State-of-the-art entity matching methods often rely on fine-tuning Transformer models such as BERT or RoBERTa. Two major drawbacks of using these models for entity matching are that (i) the models require significant amounts of fine-tuning data for reaching a good performance and (ii) the fine-tuned models are not robust concerning out-of-distribution entities. In this paper, we investigate using ChatGPT for entity matching as a more robust, training data-efficient alternative to traditional Transformer models. We perform experiments along three dimensions: (i) general prompt design, (ii) in-context learning, and (iii) provision of higher-level matching knowledge. We show that ChatGPT is competitive with a fine-tuned RoBERTa model, reaching a zero-shot performance of 82.35% F1 on a challenging matching task on which RoBERTa requires 2000 training examples for reaching a similar performance. Adding in-context demonstrations to the prompts further improves the F1 by up to 7.85% when using similarity-based example selection. Always using the same set of 10 handpicked demonstrations leads to an improvement of 4.92% over the zero-shot performance. Finally, we show that ChatGPT can also be guided by adding higher-level matching knowledge in the form of rules to the prompts. Providing matching rules leads to similar performance gains as providing in-context demonstrations.