Konferenzbeitrag
Semi-supervised topic modelling as a tool for hypothesis-driven forensic communication analysis
Lade...
Volltext URI
Dokumententyp
Text/Conference Paper
Zusatzinformation
Datum
2024
Autor:innen
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Quelle
Verlag
Gesellschaft für Informatik e.V.
Zusammenfassung
Mobile communication data has become a crucial source of evidence in digital forensics. Nevertheless, the high volume of chat messages presents a challenge for investigators, mainly as only a tiny fraction is relevant to the case. Therefore, a method is needed to summarise the messages in a way that separates the forensically relevant parts from the irrelevant ones. Topic modelling can be beneficial as it automatically extracts the main ideas of the chats. However, more than traditional unsupervised topic modelling is needed for forensic data analysis as it is inherently hypothesis-driven. This research incorporates case-specific knowledge into topic modelling to extract topics that align with the investigator’s expectations. Two semi-supervised topic modelling algorithms and proposed extensions were compared using real-case data. The user study results suggested extending algorithms based on word embeddings could help find evidence for suspected topics. Furthermore, the study examined the correlation between these findings and topic coherence, a standard measure of automatic evaluation that did not reflect actual interpretability.