Auflistung nach Schlagwort "information extraction"
1 - 2 von 2
Treffer pro Seite
Sortieroptionen
- TextdokumentDigitizing Drilling Logs - Challenges of typewritten forms(INFORMATIK 2021, 2021) Bürgl, Kim; Reinhardt, Lea; Binder, Frank; Müller, Lydia; Niekler, AndreasIn this work, we show prospects of how mining and geological documentation in the form of drilling reports can be digitized and further processed. Processing these typed and handwritten forms poses challenges for document management in renaturation projects. We highlight the structural problems of drilling reports and present three approaches for recognizing and processing the information documented in them. We use optical character recognition and document layout analysis techniques to approach the problem. Layout analysis was performed using a heuristic approach and a neural network for layout recognition. In detail, we show the approaches Form Processing (A), Table detection by line counting (B) and processing with Mask-R-CNN (C). A case study is used to show initial results and challenges. B and C are more robust than A to small changes in the form. C can recognize columns better with more training data than B in cases where table boundaries are not respected. B and C also allow other language models to be used for OCR and can thus also recognize handwriting with appropriate training data.
- TextdokumentA Hybrid Information Extraction Approach Exploiting Structured Data Within a Text Mining Process(BTW 2019, 2019) Kiefer, Cornelia; Reimann, Peter; Mitschang, BernhardMany data sets encompass structured data fields with embedded free text fields. The text fields allow customers and workers to input information which cannot be encoded in structured fields. Several approaches use structured and unstructured data in isolated analyses. The result of isolated mining of structured data fields misses crucial information encoded in free text. The result of isolated text mining often mainly repeats information already available from structured data. The actual information gain of isolated text mining is thus limited. The main drawback of both isolated approaches is that they may miss crucial information. The hybrid information extraction approach suggested in this paper adresses this issue. Instead of extracting information that in large parts was already available beforehand, it extracts new, valuable information from free texts. Our solution exploits results of analyzing structured data within the text mining process, i.e., structured information guides and improves the information extraction process on textual data. Our main contributions comprise the description of the concept of hybrid information extraction as well as a prototypical implementation and an evaluation with two real-world data sets from aftersales and production with English and German free text fields.