Code-switching ubique est - Language identification and part-of-speech tagging for historical mixed text

Schulz, Sarah ; Keller, Mareike

Additional URL:
Document Type: Conference or workshop publication
Year of publication: 2016
Book title: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2016) : August 11, 2016, Berlin, Germany
Page range: 43-51
Conference title: 10th SIGHUM Workshop
Location of the conference venue: Berlin, Germany
Date of the conference: 11.08.2016
Publisher: Reiter, Nils
Place of publication: Stroudsburg, PA
Publishing house: Association for Computational Linguistics
ISBN: 978-1-945626-09-8
Publication language: English
Institution: School of Humanities > Anglistik IV - Anglistische Linguistik/Diachronie (Trips 2006-)
Subject: 400 Language, linguistics
Individual keywords (German): Linguistische Annotation , Mittelenglisch , Latein
Keywords (English): linguistic annotation , POS tagging , code-switching , Middle English , Latin
Abstract: In this paper, we describe the development of a language identification system and a part-of-speech tagger for Latin-Middle English mixed text. To this end, we annotate data with language IDs and Universal POS tags (Petrov et al., 2012). As a classifier, we train a conditional random field classifier for both sub-tasks, including features generated by the TreeTagger models of both languages. The focus lies on both a general and a task-specific evaluation. Moreover, we describe our effort concerning beyond proof-of-concept implementation of tools and towards a more task-oriented approach, showing how to apply our techniques in the context of Humanities research.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Metadata export


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item