Treebanking user-generated content: A proposal for a unified representation in universal dependencies


Sanguinetti, Manuela ; Bosco, Cristina ; Cassidy, Lauren ; Çetinoğlu, Özlem ; Cignarella, Alessandra Teresa ; Lynn, Teresa ; Rehbein, Ines ; Ruppenhofer, Josef ; Seddah, Djamé ; Zeldes, Amir


[img]
Preview
PDF
2020.lrec-1.645.pdf - Published

Download (285kB)

URL: https://madoc.bib.uni-mannheim.de/55423
Additional URL: https://www.aclweb.org/anthology/2020.lrec-1.645
URN: urn:nbn:de:bsz:180-madoc-554237
Document Type: Conference or workshop publication
Year of publication: 2020
Book title: LREC 2020 Marseille : Twelfth International Conference on Language Resources and Evaluation$dMay 11-16, 2020, Palais du Pharo, Marseille, France : conference proceedings
Page range: 5240-5250
Conference title: LREC 2020
Location of the conference venue: Marseille, France
Date of the conference: 11.-16.05.2020
Publisher: Calzolari, Nicoletta
Place of publication: Paris ; Mannheim
Publishing house: ELRA ; IDS, Bibliothek
ISBN: 979-10-95546-34-4 , 979-10-95546-61-0
Publication language: English
Institution: Außerfakultäre Einrichtungen > SFB 884
Pre-existing license: Creative Commons Attribution, Non-Commercial 4.0 International (CC BY-NC 4.0)
Subject: 004 Computer science, internet
Abstract: The paper presents a discussion on the main linguistic phenomena of user-generated texts found in web and social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this paper is twofold: (1) to provide a short, though comprehensive, overview of such treebanks - based on available literature - along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The main goal of this paper is to provide a common framework for those teams interested in developing similar resources in UD, thus enabling cross-linguistic consistency, which is a principle that has always been in the spirit of UD.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item