Improving sentence boundary detection for spoken language transcripts


Rehbein, Ines ; Ruppenhofer, Josef ; Schmidt, Thomas



URL: https://www.aclweb.org/anthology/2020.lrec-1.878
Additional URL: https://ids-pub.bsz-bw.de/frontdoor/deliver/index/...
Document Type: Conference or workshop publication
Year of publication: 2020
Book title: LREC 2020 Marseille : Twelfth International Conference on Language Resources and Evaluation$dMay 11-16, 2020, Palais du Pharo, Marseille, France : conference proceedings
Page range: 7102-7111
Conference title: LREC 2020
Location of the conference venue: Marseille, France
Date of the conference: 11.-16.05.2020
Publisher: Calzolari, Nicoletta
Place of publication: Paris ; Mannheim
Publishing house: ELRA ; IDS, Bibliothek
ISBN: 979-10-95546-34-4 , 979-10-95546-61-0
Publication language: English
Institution: Außerfakultäre Einrichtungen > SFB 884
Subject: 004 Computer science, internet
Abstract: This paper presents experiments on sentence boundary detection in transcripts of spoken dialogues. Segmenting spoken language into sentence-like units is a challenging task, due to disfluencies, ungrammatical or fragmented structures and the lack of punctuation. In addition, one of the main bottlenecks for many NLP applications for spoken language is the small size of the training data, as the transcription and annotation of spoken language is by far more time-consuming and labour-intensive than processing written language. We therefore investigate the benefits of data expansion and transfer learning and test different ML architectures for this task. Our results show that data expansion is not straightforward and even data from the same domain does not always improve results. They also highlight the importance of modelling, i.e. of finding the best architecture and data representation for the task at hand. For the detection of boundaries in spoken language transcripts, we achieve a substantial improvement when framing the boundary detection problem assentence pair classification task, as compared to a sequence tagging approach.




Dieser Eintrag ist Teil der Universitätsbibliographie.




Metadata export


Citation


+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item