The SearchEngine: A holistic approach to matching


Doherr, Thorsten


[img] PDF
dp23001.pdf - Published

Download (1MB)

URN: urn:nbn:de:bsz:180-madoc-643100
Document Type: Working paper
Year of publication: 2023
The title of a journal, publication series: ZEW Discussion Papers
Volume: 23-001
Place of publication: Mannheim
Publication language: English
Institution: Sonstige Einrichtungen > ZEW - Leibniz-Zentrum für Europäische Wirtschaftsforschung
MADOC publication series: Veröffentlichungen des ZEW (Leibniz-Zentrum für Europäische Wirtschaftsforschung) > ZEW Discussion Papers
Subject: 330 Economics
Classification: JEL: C81 , C88,
Keywords (English): data linkage , firm matching , entity resolution , machine learning
Abstract: The SearchEngine is an open source project providing an integrated framework for diverse matching activities, especially the linkage of large scale firm data by fuzzy criteria like company names and addresses. At its core, it utilizes an efficient candidate retrieval mechanism implementing a word respectively token driven heuristic. Every record in one table becomes a search term to retrieve similar candidate records in the base table according to a search strategy replacing blocking strategies of conventional matching efforts. Because similarity is inherently established by the candidate selection, it is only required to filter false positives by using the meta data export file derived from the matching heuristic to implement a machine learning approach. This paper discusses the general foundation of the heuristic and the algorithm while two detailed walkthroughs of company linkages show practical examples




Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item