Comparing rule-based and SMT-based spelling normalisation for English historical texts

Schneider, Gerold ; Pettersson, Eva ; Percillier, Michael

Additional URL:
Document Type: Conference or workshop publication
Year of publication: 2017
Book title: Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language : 22 May 2017, Gothenburg
The title of a journal, publication series: NEALT Proceedings Series
Volume: 32
Page range: 40-46
Conference title: NoDaLiDa 2017 Workshop on Processing Historical Language
Location of the conference venue: Göteburg, Sweden
Date of the conference: 22.05.2017
Publisher: Bouma, Gerlof
Place of publication: Linköping
Publishing house: Linköping University Electronic Press
ISBN: 978-91-7685-503-4
ISSN: 1650-3686 , 1650-3740
Publication language: English
Institution: School of Humanities > Anglistik IV - Anglistische Linguistik/Diachronie (Trips 2006-)
Subject: 420 English
Abstract: To be able to use existing natural language processing tools for analysing historical text, an important preprocessing step is spelling normalisation, converting the original spelling to present-day spelling, before applying tools such as taggers and parsers. In this paper, we compare a probablistic, language-independent approach to spelling normalisation based on statistical machine translation (SMT) techniques, to a rule-based system combining dictionary lookup with rules and non-probabilistic weights. The rule-based system reaches the best accuracy, up to 94% precision at 74% recall, while the SMT system improves each tested period.
Additional information: Linköping Electronic Conference Proceedings ; 133. - Online-Ressource

Dieser Eintrag ist Teil der Universitätsbibliographie.

Metadata export


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item