Profiling entity matching benchmark tasks
Primpeli, Anna
;
Bizer, Christian
DOI:
|
https://doi.org/10.1145/3340531.3412781
|
URL:
|
https://dl.acm.org/doi/10.1145/3340531.3412781
|
Document Type:
|
Conference or workshop publication
|
Year of publication:
|
2020
|
Book title:
|
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management :
|
Page range:
|
3101-3108
|
Conference title:
|
CIKM '20
|
Location of the conference venue:
|
Online
|
Date of the conference:
|
19.-23.10.2020
|
Publisher:
|
D'Aquin, Mathieu
|
Place of publication:
|
New York, NY
|
Publishing house:
|
Association for Computing Machinery
|
ISBN:
|
978-1-4503-6859-9
|
Publication language:
|
English
|
Institution:
|
School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
|
Subject:
|
004 Computer science, internet
|
Keywords (English):
|
profiling , baseline evaluation , reproducibility , entity matching , benchmarking
|
Abstract:
|
Entity matching is a central task in data integration which has been researched for decades. Over this time, a wide range of benchmark tasks for evaluating entity matching methods has been developed. This resource paper systematically complements, profiles, and compares 21 entity matching benchmark tasks. In order to better understand the specific challenges associated with different tasks, we define a set of profiling dimensions which capture central aspects of the matching tasks. Using these dimensions, we create groups of benchmark tasks having similar characteristics. Afterwards, we assess the difficulty of the tasks in each group by computing baseline evaluation results using standard feature engineering together with two common classification methods. In order to enable the exact reproducibility of evaluation results, matching tasks need to contain exactly defined sets of matching and non-matching record pairs, as well as a fixed development and test split. As this is not the case for some widely-used benchmark tasks, we complement these tasks with fixed sets of non-matching pairs, as well as fixed splits, and provide the resulting development and test sets for public download. By profiling and complementing the benchmark tasks, we support researchers to select challenging as well as diverse tasks and to compare matching systems on clearly defined grounds.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Search Authors in
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
|
Show item |
|
|