Logo des Repositoriums
 

Analyzing Historical Legal Textcorpora: German VET and CVET regulations

dc.contributor.authorReiser, Thomas
dc.contributor.authorDörpinghaus, Jens
dc.contributor.authorSteiner, Petra
dc.contributor.editorKlein, Maike
dc.contributor.editorKrupka, Daniel
dc.contributor.editorWinter, Cornelia
dc.contributor.editorGergeleit, Martin
dc.contributor.editorMartin, Ludger
dc.date.accessioned2024-10-21T18:24:18Z
dc.date.available2024-10-21T18:24:18Z
dc.date.issued2024
dc.description.abstractThe digitization of historical documents has gained particular interest in recent years. The majority of research endeavors aim at digitizing historical documents by extracting text from scanned images. A pipeline that transcribes scanned documents into fully structured texts was utilized to digitize over 900 German VET and CVET regulations. As a preliminary investigation, a basic corpus analysis was conducted to assess the usability of the digitized documents and the necessity for document digitization methods that can generate transcripts that maintain the logical text structure and hierarchy. This paper focuses on the processing of the transcripts created from German VET and CVET regulation images to demonstrate the advantages of fully structured text over plain OCR results and to illustrate that even simple analyses require more information for more comprehensive document understanding.en
dc.identifier.doi10.18420/inf2024_174
dc.identifier.isbn978-3-88579-746-3
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/45152
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofINFORMATIK 2024
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-352
dc.subjectDocument digitization
dc.subjectOCR
dc.subjectLegal texts
dc.subjectCorpus analysis
dc.titleAnalyzing Historical Legal Textcorpora: German VET and CVET regulationsen
dc.typeText/Conference Paper
gi.citation.endPage2018
gi.citation.publisherPlaceBonn
gi.citation.startPage2007
gi.conference.date24.-26. September 2024
gi.conference.locationWiesbaden
gi.conference.sessiontitleDigitalization and AI for and in Education and Educational Research (DAI-EaR'24)

Dateien

Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Reiser_et_al_Analyzing_Historical_Legal_Textcorpora.pdf
Größe:
1.37 MB
Format:
Adobe Portable Document Format