Eliminating fuzzy duplicates in crowdsourced lexical resources

Kiselev, Yuri ; Ustalov, Dmitry ; Porshnev, Sergey

Kiselev-GWC2016.pdf - Published

Download (215kB)

URL: https://ub-madoc.bib.uni-mannheim.de/43369
URN: urn:nbn:de:bsz:180-madoc-433699
Document Type: Conference or workshop publication
Year of publication: 2016
Book title: Proceedings of the Eighth Global WordNet Conference (GWC-16) : January 27-30, Bucharest, Romania
Page range: 161-167
Conference title: Global WordNet Conference 2016
Location of the conference venue: Bucharest, Romania
Date of the conference: January 27-30, 2016
Publisher: Barbu Mititelu, Verginica
Place of publication: Bucarest
Publishing house: Global WordNet Association
ISBN: 978-606-714-239-6 , 978-973-0-20728-6
Publication language: English
Institution: School of Business Informatics and Mathematics > Information Systems III: Enterprise Data Analysis (Ponzetto 2016-)
Subject: 004 Computer science, internet
Abstract: Collaboratively created lexical resources is a trending approach to creating high quality thesauri in a short time span at a remarkably low price. The key idea is to invite non-expert participants to express and share their knowledge with the aim of constructing a resource. However, this approach tends to be noisy and error-prone, thus making data cleansing a highly topical task to perform. In this paper, we study different techniques for synset deduplication including machine- and crowd-based ones. Eventually, we put forward an approach that can solve the deduplication problem fully automatically, with the quality comparable to the expert-based approach.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.

Dieser Datensatz wurde nicht während einer Tätigkeit an der Universität Mannheim veröffentlicht, dies ist eine Externe Publikation.

Metadata export


+ Search Authors in

BASE: Kiselev, Yuri ; Ustalov, Dmitry ; Porshnev, Sergey

Google Scholar: Kiselev, Yuri ; Ustalov, Dmitry ; Porshnev, Sergey

ORCID: Kiselev, Yuri ; Ustalov, Dmitry ORCID: 0000-0002-9979-2188 ; Porshnev, Sergey

+ Download Statistics

Downloads per month over past year

View more statistics

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item