Auflistung nach Autor:in "Niekler, Andreas"
1 - 2 von 2
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelContent Analysis between Quality and Quantity(Datenbank-Spektrum: Vol. 15, No. 1, 2015) Lemke, Matthias; Niekler, Andreas; Schaal, Gary S.; Wiedemann, GregorSocial science research using Text Mining tools requires—due to the lack of a canonical heuristics in the digital humanities—a blended reading approach. Integrating quantitative and qualitative analyses of complex textual data progressively, blended reading brings up various requirements for the implementation of Text Mining infrastructures. The article presents the Leipzig Corpus Miner (LCM), developed in the joint research project ePol—Post-Democracy and Neoliberalism and responding to social science research requirements. The functionalities offered by the LCM may serve as best practice of processing data in accordance with blended reading.
- TextdokumentDigitizing Drilling Logs - Challenges of typewritten forms(INFORMATIK 2021, 2021) Bürgl, Kim; Reinhardt, Lea; Binder, Frank; Müller, Lydia; Niekler, AndreasIn this work, we show prospects of how mining and geological documentation in the form of drilling reports can be digitized and further processed. Processing these typed and handwritten forms poses challenges for document management in renaturation projects. We highlight the structural problems of drilling reports and present three approaches for recognizing and processing the information documented in them. We use optical character recognition and document layout analysis techniques to approach the problem. Layout analysis was performed using a heuristic approach and a neural network for layout recognition. In detail, we show the approaches Form Processing (A), Table detection by line counting (B) and processing with Mask-R-CNN (C). A case study is used to show initial results and challenges. B and C are more robust than A to small changes in the form. C can recognize columns better with more training data than B in cases where table boundaries are not respected. B and C also allow other language models to be used for OCR and can thus also recognize handwriting with appropriate training data.