Reiser, ThomasDörpinghaus, JensSteiner, PetraKlein, MaikeKrupka, DanielWinter, CorneliaGergeleit, MartinMartin, Ludger2024-10-212024-10-212024978-3-88579-746-3https://dl.gi.de/handle/20.500.12116/45152The digitization of historical documents has gained particular interest in recent years. The majority of research endeavors aim at digitizing historical documents by extracting text from scanned images. A pipeline that transcribes scanned documents into fully structured texts was utilized to digitize over 900 German VET and CVET regulations. As a preliminary investigation, a basic corpus analysis was conducted to assess the usability of the digitized documents and the necessity for document digitization methods that can generate transcripts that maintain the logical text structure and hierarchy. This paper focuses on the processing of the transcripts created from German VET and CVET regulation images to demonstrate the advantages of fully structured text over plain OCR results and to illustrate that even simple analyses require more information for more comprehensive document understanding.enDocument digitizationOCRLegal textsCorpus analysisAnalyzing Historical Legal Textcorpora: German VET and CVET regulationsText/Conference Paper10.18420/inf2024_1741617-5468