Disambiguation by namesake risk assessment


Doherr, Thorsten


[img] PDF
dp21021.pdf - Published

Download (875kB)

URL: https://madoc.bib.uni-mannheim.de/59129
URN: urn:nbn:de:bsz:180-madoc-591293
Document Type: Working paper
Year of publication: 2021
Volume: 21-021
Place of publication: Mannheim
Publication language: English
Institution: Sonstige Einrichtungen > ZEW - Leibniz-Zentrum für Europäische Wirtschaftsforschung
MADOC publication series: Veröffentlichungen des ZEW (Leibniz-Zentrum für Europäische Wirtschaftsforschung) > ZEW Discussion Papers
Subject: 330 Economics
Classification: JEL: C18 , C36,
Keywords (English): Homonymy , namesakes , disambiguation , scientific careers , inventors , patents , publications
Abstract: Most bibliometric databases only provide names as the handle to their careers leading to the issue of namesakes. We introduce a universal method to assess the risk of linking documents of different individuals sharing the same name with the goal of collecting the documents into personalized clusters. A theoretical setup for the probability of drawing a namesake depending on the number of namesakes in the population and the size of the observed unit replaces the need for training datasets, thereby avoiding a namesake bias caused by the inherent underestimation of namesakes in training/benchmark data. A Poisson model based on a master sample of unambiguously identified individuals estimates the main component, the number of namesakes for any given name. To implement the algorithm, we reduce the complexity in the data by resolving similarity in properties. At the core of the implementation is a mechanism returning the unit size of the intersected mutual properties linking two documents. Because of the high computational demands of this mechanism, it is a necessity to discuss means to optimize the procedure.




Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item