Textdokument
A Classification Framework for Scientific Documents to Support Knowledge Graph Population
Lade...
Volltext URI
Dokumententyp
Dateien
Zusatzinformation
Datum
2025
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Verlag
Gesellschaft für Informatik, Bonn
Zusammenfassung
Research papers are a central communication medium to share new scientific insights and progress and are, nowadays, stored as PDF files. However, little effort is spent on reorganizing information with effective knowledge classification and comprehensive representation during the publication process. In terms of software engineering (SE), those papers are also aligned to software artifacts and research data. Aggregating knowledge and empirical evidence is done with systematic literature studies that tend to be very time-consuming and require a manual inspection of the respective research artifacts (paper-and data-wise). Research knowledge graphs like the Open Research Knowledge Graph (ORKG) aim to contribute to and rethink scholarly communication by providing formats while easing the processing of semantic information. Therefore, ORKG offers templates to summarize and structure a research artifact’s content, providing metadata as well. Based on this, researchers can connect similar papers, reuse replication artifacts, and generate literature studies more easily. However, adding papers to ORKG is still a tedious manual process. Moreover, selecting suitable template formats is challenging and can be highly domain-specific. To support the manual process, we aim to provide an automated classification framework for scientific papers, supporting the knowledge graph population. This framework intends to be flexible, allowing various input data and schemas, so it can be applied and trained in a multitude of research fields in SE. In this paper, we present the concept of the framework’s implementation, and an excerpt of the evaluation result in one of the research subfields in SE, namely software architecture and design.