Unsupervised bootstrapping of active learning for entity resolution
Primpeli, Anna
;
Bizer, Christian
;
Keuper, Margret
DOI:
|
https://doi.org/10.1007/978-3-030-49461-2_13
|
URL:
|
https://link.springer.com/chapter/10.1007/978-3-03...
|
Dokumenttyp:
|
Konferenzveröffentlichung
|
Erscheinungsjahr:
|
2020
|
Buchtitel:
|
The Semantic Web : 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31-June 4, 2020, Proceedings
|
Titel einer Zeitschrift oder einer Reihe:
|
Lecture Notes in Computer Science
|
Band/Volume:
|
12123
|
Seitenbereich:
|
215-231
|
Veranstaltungstitel:
|
ESWC 2020
|
Veranstaltungsort:
|
Online
|
Veranstaltungsdatum:
|
31.05.-04.06.2020
|
Herausgeber:
|
Harth, Andreas
|
Ort der Veröffentlichung:
|
Berlin [u.a.]
|
Verlag:
|
Springer
|
ISBN:
|
978-3-030-49460-5 , 978-3-030-49462-9 , 978-3-030-49461-2
|
ISSN:
|
0302-9743 , 1611-3349
|
Sprache der Veröffentlichung:
|
Englisch
|
Einrichtung:
|
Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Information Systems V: Web-based Systems (Bizer 2012-) Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Bildverarbeitung (Juniorprofessur) (Keuper 2017-2021)
|
Fachgebiet:
|
004 Informatik
|
Abstract:
|
Entity resolution is one of the central challenges when integrating data from large numbers of data sources. Active learning for entity resolution aims to learn high-quality matching models while minimizing the human labeling effort by selecting only the most informative record pairs for labeling. Most active learning methods proposed so far, start with an empty set of labeled record pairs and iteratively improve the prediction quality of a classification model by asking for new labels. The absence of adequate labeled data in the early active learning iterations leads to unstable models of low quality which is known as the cold start problem. In our work we solve the cold start problem using an unsupervised matching method to bootstrap active learning. We implement a thresholding heuristic that considers pre-calculated similarity scores and assigns matching labels with some degree of noise at no manual labeling cost. The noisy labels are used for initializing the active learning process and throughout the whole active learning cycle for model learning and query selection. We evaluate our pipeline with six datasets from three different entity resolution settings using active learning with a committee-based query strategy and show it successfully deals with the cold start problem. Comparing our method against two active learning baselines without bootstrapping, we show that it can additionally lead to overall improved learned models in terms of F1 score and stability.
|
| Dieser Eintrag ist Teil der Universitätsbibliographie. |
Suche Autoren in
BASE:
Primpeli, Anna
;
Bizer, Christian
;
Keuper, Margret
Google Scholar:
Primpeli, Anna
;
Bizer, Christian
;
Keuper, Margret
ORCID:
Primpeli, Anna ORCID: https://orcid.org/0000-0002-1783-2482, Bizer, Christian ORCID: https://orcid.org/0000-0003-2367-0237 and Keuper, Margret
Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail
Actions (login required)
|
Eintrag anzeigen |
|
|