Robust active learning of expressive linkage rules
Primpeli, Anna
;
Bizer, Christian
DOI:
|
https://doi.org/10.1145/3326467.3326484
|
URL:
|
https://dl.acm.org/citation.cfm?doid=3326467.33264...
|
Document Type:
|
Conference or workshop publication
|
Year of publication:
|
2019
|
Book title:
|
WIMS2019 : Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics, Seoul, Republic of Korea, June 26 - 28, 2019
|
Page range:
|
2:1-2:7
|
Conference title:
|
WIMS 2019
|
Location of the conference venue:
|
Seoul, Republic of Korea
|
Date of the conference:
|
June 26-28, 2019
|
Publisher:
|
Akerkar, Rajendra
|
Place of publication:
|
New York, NY
|
Publishing house:
|
ACM
|
ISBN:
|
978-1-4503-6190-3
|
Publication language:
|
English
|
Institution:
|
School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
|
Subject:
|
004 Computer science, internet
|
Individual keywords (German):
|
Active Learning , Entity Resolution , Genetic Programming , Missing Values , Sparse Data , Web Data
|
Abstract:
|
The goal of entity resolution, also known as duplicate detection and record linkage, is to identify all records in one or more data sets that refer to the same real-world entity. To achieve this goal, matching rules, encoding the matching patterns in the data, can be learned with the help of manually annotated record pairs. Active learning for entity resolution aims to minimize the human labeling effort by including the human into the learning loop and by selecting the most informative pairs for labeling. While active learning methods are quite successful at reducing the human labeling effort, we show that their performance decreases when applied to data sets having a large number of sparse attributes. We evaluate the ActiveGenLink active learning method using e-commerce data sets with such characteristics and observe that it is prone to suboptimal convergence points, thus producing highly varying results in different runs of the same experiment. In this paper we present our ongoing work on building a robust active learning method which is able to tackle the instability. Our method applies unsupervised matching of the record pairs as a first step. The unsupervised matching results are used afterwards for bootstrapping the active learning process and for preventing it from converging to suboptimal matching rules. The evaluation shows that the proposed method increases the robustness of the active learning process as it reduces the variation of the results of different runs by 10% to 16%.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Search Authors in
You have found an error? Please let us know about your desired correction here: E-Mail
Actions (login required)
|
Show item |
|
|