- JournalA Big Data Case Study in Digital Humanities(Datenbank-Spektrum: Vol. 19, No. 1, 2019) Heyer, Gerhard; Tiepmar, Jochen
- KonferenzbeitragClassifying figures and illustrations in electronics datasheets: A comparative evaluation of recent computer vision models on a custom collection of 4000 technical documents(INFORMATIK 2023 - Designing Futures: Zukünfte gestalten, 2023) Perakis, Lymperis; Balling, Julian; Binder, Frank; Heyer, Gerhard; Kreupl, FranzWe report findings from a comparative evaluation of several recent object detection models applied to a domain-specific use case in technical document analysis and graphics recognition. More specifically, we apply models from the EfficientDet and YOLO model families to detect and classify figures in electronics datasheets according to a custom classification scheme. We identify YOLOv7-D6 as the most accurate model in our study and show that it can successfully solve this task. We highlight an iterative approach to figure annotation in document page images for creating a comprehensive and balanced custom dataset for our use case. In our experiments, the object detection models show impressive performance levels on par with state-of-the-art results from the literature and related studies.
- KonferenzbeitragAn evaluation framework for semantic search in P2P networks(10th International Conferenceon Innovative Internet Community Systems (I2CS) – Jubilee Edition 2010 –, 2010) Holz, Florian; Witschel, Hans-Friedrich; Heinrich, Gregor; Heyer, Gerhard; Teresniak, SvenWe address the problem of evaluating peer-to-peer information retrieval (P2PIR) systems with semantic overlay structure. The P2PIR community lacks a commonly accepted testbed, such as TREC is for the classic IR community. The problem with using classic test collections in a P2P scenario is that they provide no realistic distribution of documents and queries over peers, which is, however, crucial for realistically simulating and evaluating semantic overlays. On the other hand, document collections that can be easily distributed (e.g. by exploiting categories or author information) lack both queries and relevance judgments. Therefore, we propose an evaluation framework, which provides a strategy for constructing a P2PIR testbed, consisting of a prescription for content distribution, query generation and measuring effectiveness without the need for human relevance judgments. It can be used with any document collection that contains author information and document relatedness (e.g. references in scientific literature). Author information is used for assigning documents to peers, relatedness is used for generating queries from related documents. The ranking produced by the P2PIR system is evaluated by comparing it to the ranking of a centralised IR system using a new evaluation measure related to mean average precision. The combination of these three things – realistic content distribution, realistic and automated query generation and distribution, and a meaningful and flexible evaluation measure for rankings – offers an improvement over existing P2PIR evaluation approaches.
- KonferenzbeitragGeneric tools and individual research needs in the Digital Humanities – Can agile development help?(INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft (Workshop-Beiträge), 2019) Heyer, Gerhard; Kahmann, Christian; Kantner, CathleenMany Digital Humanities research projects from many different target disciplines regularly encounter the same recurring key problems and key procedures such as preprocessing, standard text analytics, and visualization, which would be very time consuming if conducted without DH tools. This calls for the use of generic platforms. However, there is a trade-off, since different researchers from many different disciplines look at their objects from different theoretical perspectives. This raises the general question how we can deal with this very typical conflict, and whether agile development might be a suitable development method to cope with it. By addressing this question, we report on an experience during a DH summer school as a condensed experiment in dealing with this trade-off using the iLCM as a generic platform. In summary, although we have not arrived at a procedural solutions for balancing individual user needs and generic problems which call for generic tools, our summer academy experience well illustrates the high potential of a software eco-system supporting the approach of agile development in Digital Humanities, and may help to better understand the role of generic software tools and their role in DH.
GI Annual Meeting 2010. Workshop. eHumanities - How does computer science benefit?(INFORMATIK 2010. Service Science – Neue Perspektiven für die Informatik. Band 2, 2010) Heyer, Gerhard; Büchler, Marco
- ZeitschriftenartikelInteraktive explorative Suche in großen Dokumentbeständen(Datenbank-Spektrum: Vol. 11, No. 3, 2011) Heyer, Gerhard; Keim, Daniel; Teresniak, Sven; Oelke, DanielaIm klassischen Paradigma des Information Retrievals steht das Finden von Dokumenten im Vordergrund, die Informationen bzw. Fakten enthalten, die dem vermuteten Informationsbedürfnis des Nutzers entsprechen. Dabei stellt der Nutzer solche Anfragen an das Informationssystem, von denen er annimmt, dass dazu eindeutige Antworten im Informationssystem vorhanden sind, die lediglich zurückgeliefert oder gefunden werden müssen. In vielen Fällen ist der Benutzer aber weniger an den Fakten selber interessiert, als vielmehr daran, wie über Fakten berichtet wird: Über welche Fakten wird berichtet? Nach welchen Kriterien werden Fakten ausgewählt? Wie werden Fakten bewertet? Welche Konzeptualisierungen der Anwendungsdomäne werden vorausgesetzt? Und wie ändern sich Bewertungen und Konzeptualisierungen über die Zeit? Der vorgestellte Ansatz skizziert eine mögliche Lösung für die explorative Suche in großen Datenmengen.
- KonferenzbeitragSalton und Wittgenstein in den Humanities: Über die Semantik in Philosophischen Texten(INFORMATIK 2010. Service Science – Neue Perspektiven für die Informatik. Band 2, 2010) Büchler, Marco; Heyer, GerhardIn der Informatik wird die Semantik durch diskriminierende Terme beschrieben. Jedoch fehlen oftmals speziell in philosophischen Texten genau diese gewichtigen Terme. Ausgehend von der oft eingesetzten diskriminierenden Semantik wird am Problem der Sinnund Weisheitssprüche eine kontrastive Semantik vorgestellt. Die eingeführte Methode stellt ein Lessons Learnt aus dem eAQUA-Projekt [BHG08, HBB+10] im Umgang mit antiken Texten dar.