Automated Traceability Link Recovery Between Requirements and Source Code

Hey, TobiasHerrmann, Andrea2024-07-262024-07-262024https://dl.gi.de/handle/20.500.12116/44191Efficient development, maintenance, and management of software systems rely heavily on understanding the relationships between the various software artifacts. Manual creation and maintenance of traceability information between these artifacts incur high costs due to the required human expertise. Often this results in a deficiency in available traceability information that hampers the efficiency of the software projects. The FTLR approach presented in this dissertation aims to enhance automatic traceability link recovery between requirements and source code by leveraging a fine-grained semantic similarity comparison using pre trained word embeddings and Word Movers Distance. FTLR achieves significant improvements in traceabil ity connection identification by employing the fine grained mapping with subsequent majority vote-based aggregation. Additionally, it employs a novel approach for filtering irrelevant parts of requirements using a large language model-based classifier called NoRBERT, which achieves promising results on unseen projects. Furthermore, this dissertation explores the integration of bimodal large language models into FTLR but finds no significant performance increase over word embeddings. A comparative analysis showed that FTLR outperforms existing approaches in mean average precision and F1-scores, especially on projects with object-oriented source code. However, challenges, remain to fully automate traceability link recovery, particularly in large-scale projects.enTraceability Link RecoveryRequire ments ClassificationLarge Language Models for Software EngineeringInformation RetrievalMachine LearningAutomated Traceability Link Recovery Between Requirements and Source CodeText/Journal Article