Ocromore : Combining multiple OCR-engine results to improve character recognition accuracy


Kamlah, Jan ; Stegmüller, Johannes


[img]
Preview
PDF (Ocromore)
Ocromore.pdf - Published

Download (7MB)

DOI: https://doi.org/10.5281/zenodo.1493860
URL: https://ub-madoc.bib.uni-mannheim.de/48756
URN: urn:nbn:de:bsz:180-madoc-487569
Document Type: Conference presentation
Year of publication: 2018
Conference title: 14. International Bibliotheca Baltica Symposium
Location of the conference venue: Rostock, Germany
Date of the conference: 04.-05.10.2018
Publication language: English
Institution: Zentrale Einrichtungen > University Library
License: CC BY 4.0 Creative Commons Attribution 4.0 International (CC BY 4.0)
Subject: 020 Library and information sciences
Abstract: One of the goals of the Aktienführer-Datenarchiv project is to process data from the Aktienführer and store it in a structured manner in a database. The Aktienführer is a reference work published annually between 1956-1999 as print book comprising data for companies listed at stock exchanges in Germany. A high character recognition accuracy is crucial for structure recognition and further analyses of the OCR-output. To optimize the OCR quality, "Ocromore" was developed. It is a toolset for combining multiple OCR-outputs. The best combined result is achieved with an word-wise character confidence-based multi sequence alignment (msa) approach. Our results show an character accuracy increase of 0,49% and an error reduction of 33% compared to the best single result.




Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

BASE: Kamlah, Jan ; Stegmüller, Johannes

Google Scholar: Kamlah, Jan ; Stegmüller, Johannes

ORCID: Kamlah, Jan ORCID: 0000-0002-0417-7562 ; Stegmüller, Johannes

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item