Logo des Repositoriums
 

Handwritten Text Recognition Error Rate Reduction in Historical Documents using Naive Transcribers

dc.contributor.authorChristlein, Vincent
dc.contributor.authorNicolaou, Anguelos
dc.contributor.authorSchlauwitz, Thorsten
dc.contributor.authorSpäth, Sabrina
dc.contributor.authorHerbers, Klaus
dc.contributor.authorMaier, Andreas
dc.contributor.editorBurghardt, Manuel
dc.contributor.editorMüller-Birn, Claudia
dc.date.accessioned2018-09-11T12:29:55Z
dc.date.available2018-09-11T12:29:55Z
dc.date.issued2018
dc.description.abstractHandwritten text recognition (HTR) is a difficult research problem. In particular for historical documents, this task is hard as handwriting style, orthography, and text quality pose significant challenges. Creation of a single multi-purpose HTR system seems to be out of reach for current state-of-the-art systems. Therefore, we are interested in fast creation of specialized HTR systems for a particular set of historical documents. Still manual annotation by historical experts is expensive and can often not be applied at a large scale. Instead, we use the transcripts of naive transcribers that may still contain a significant amount of errors. In this paper, we propose to fuse the recognized word-chain with naive transcribers that can be obtained in a cost-effective way. For the actual fusion, we rely on a word-level approach, the so-called Recognizer Output Voting Error Reduction (ROVER). Results indicate that we are able to reduce the Word Error Rate (WER) of an HTR system trained with only few pages from 2.6 % to 19.2% with two additional transcribers with 25.1% and 27.1% WER each. This performance is already close to current state-of-the-art systems trained with significantly more data.en
dc.identifier.doi10.18420/infdh2018-13
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/16993
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofINF-DH-2018
dc.subjecthandwritten text recognition
dc.subjectnaive transcribers
dc.subjectline fusion
dc.titleHandwritten Text Recognition Error Rate Reduction in Historical Documents using Naive Transcribersen
dc.typeText/Workshop Paper
gi.citation.publisherPlaceBonn
gi.conference.date25. September 2018
gi.conference.locationBerlin, Germany
gi.conference.sessiontitleGI-Workshop

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
INF-DH-2018_paper_13.pdf
Größe:
2.38 MB
Format:
Adobe Portable Document Format