Graph-boosted active learning for multi-source entity resolution

Primpeli, Anna ; Bizer, Christian

Additional URL:
Document Type: Conference or workshop publication
Year of publication: 2021
Book title: The Semantic Web – ISWC 2021 : 20th international semantic web conference, ISWC 2021, virtual event, October 24–28, 2021, proceedings
The title of a journal, publication series: Lecture Notes in Computer Science
Volume: 12922
Page range: 182-199
Conference title: ISWC 2021 (20, 2021)
Location of the conference venue: online
Date of the conference: 24.-28.10.2021
Publisher: Hotho, Andreas ; Blomqvist, Eva ; Dietze, Stefan ; Fokoue, Achille ; Ding, Ying ; Barnaghi, Payam ; Haller, Armin ; Dragoni, Mauro ; Alani, Harith
Place of publication: Berlin [u.a.]
Publishing house: Springer
ISBN: 978-3-030-88360-7 , |978-3-030-88360-7 , 978-3-030-88362-1
ISSN: 0302-9743 , 1611-3349
Publication language: English
Institution: School of Business Informatics and Mathematics > Information Systems V: Web-based Systems (Bizer 2012-)
Subject: 004 Computer science, internet
Keywords (English): entity resolution , link discovery , multi-source entity matching , active learning
Abstract: Supervised entity resolution methods rely on labeled record pairs for learning matching patterns between two or more data sources. Active learning minimizes the labeling effort by selecting informative pairs for labeling. The existing active learning methods for entity resolution all target two-source matching scenarios and ignore signals that only exist in multi-source settings, such as the Web of Data. In this paper, we propose ALMSER, a graph-boosted active learning method for multi-source entity resolution. To the best of our knowledge, ALMSER is the first active learning-based entity resolution method that is especially tailored to the multi-source setting. ALMSER exploits the rich correspondence graph that exists in multi-source settings for selecting informative record pairs. In addition, the correspondence graph is used to derive complementary training data. We evaluate our method using five multi-source matching tasks having different profiling characteristics. The experimental evaluation shows that leveraging graph signals leads to improved results over active learning methods using margin-based and committee-based query strategies in terms of F1 score on all tasks.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Metadata export


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item