Lemmatising verbs in Middle English corpora: The benefit of enriching the Penn-Helsinki Parsed Corpus of Middle English 2 (PPCME2), the Parsed Corpus of Middle English Poetry (PCMEP), and A Parsed Linguistic Atlas of Early Middle English (PLAEME)


Percillier, Michael ; Trips, Carola



URL: http://www.lrec-conf.org/proceedings/lrec2020/inde...
Additional URL: http://www.lrec-conf.org/proceedings/lrec2020/pdf/...
Document Type: Conference or workshop publication
Year of publication: 2020
Book title: LREC 2020 : twelfth International Conference on Language Resources and Evaluation$d May 11-16 , 2020, Marseille, France : conference proceedings
Page range: 7172-7180
Conference title: LREC 2020
Location of the conference venue: Marseille, France
Date of the conference: 11.-16.05.2020
Publisher: Calzolari, Nicoletta
Place of publication: Paris
Publishing house: ELRA
ISBN: 979-10-95546-34-4
Publication language: German
Institution: School of Humanities > Anglistik IV - Anglistische Linguistik/Diachronie (Trips 2006-)
Subject: 400 Language, linguistics
Abstract: This paper describes the lemmatisation of three annotated corpora of Middle English—the Penn-Helsinki Parsed Corpus of Middle English 2 (PPCME2), the Parsed Corpus of Middle English Poetry (PCMEP), and A Parsed Linguistic Atlas of Early Middle English (PLAEME) — which is a prerequisite for systematically investigating the argument structures of verbs of the given time. Creating this tool and enriching existing parsed corpora of Middle English is part of the project Borrowing of Argument Structure in Contact Situations (BASICS) which seeks to explain to which extent verbs copied from Old French had an impact on the grammar of Middle English. First, we lemmatised the PPCME2 by (1) creating an inventory of form-lemma correspondences linking forms in the PPCME2 to lemmas in the MED, and (2) inserting this lemma information into the corpus (precision: 94.85%, recall: 98.92%). Second, we enriched the PCMEP and PLAEME, which adopted the annotation format of the PPCME2, with verb lemmas to undertake studies that fill the well-known data gap in the subperiod (1250–1350) of the PPCME2. The case study of reflexives shows that with our method we gain much more reliable results in terms of diachrony, diatopy and contact-induced change.
Additional information: Online-Ressource

Dieser Eintrag ist Teil der Universitätsbibliographie.




+ Citation Example and Export

Percillier, Michael ORCID: 0000-0002-5195-3501 ; Trips, Carola Lemmatising verbs in Middle English corpora: The benefit of enriching the Penn-Helsinki Parsed Corpus of Middle English 2 (PPCME2), the Parsed Corpus of Middle English Poetry (PCMEP), and A Parsed Linguistic Atlas of Early Middle English (PLAEME). Calzolari, Nicoletta 7172-7180 In: LREC 2020 : twelfth International Conference on Language Resources and Evaluation$d May 11-16 , 2020, Marseille, France : conference proceedings (2020) Paris LREC 2020 (Marseille, France) [Conference or workshop publication]


+ Search Authors in

BASE: Percillier, Michael ; Trips, Carola

Google Scholar: Percillier, Michael ; Trips, Carola

ORCID: Percillier, Michael ORCID: 0000-0002-5195-3501 ; Trips, Carola

+ Page Views

Hits per month over past year

Detailed information



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item