Folino, FrancescoFolino, GianluigiGuarascio, MassimoPontieri, Luigi2023-02-152023-02-1520222022http://dx.doi.org/10.1007/s12599-022-00749-9https://dl.gi.de/handle/20.500.12116/40222Predicting the final outcome of an ongoing process instance is a key problem in many real-life contexts. This problem has been addressed mainly by discovering a prediction model by using traditional machine learning methods and, more recently, deep learning methods, exploiting the supervision coming from outcome-class labels associated with historical log traces. However, a supervised learning strategy is unsuitable for important application scenarios where the outcome labels are known only for a small fraction of log traces. In order to address these challenging scenarios, a semi-supervised learning approach is proposed here, which leverages a multi-target DNN model supporting both outcome prediction and the additional auxiliary task of next-activity prediction. The latter task helps the DNN model avoid spurious trace embeddings and overfitting behaviors. In extensive experimentation, this approach is shown to outperform both fully-supervised and semi-supervised discovery methods using similar DNN architectures across different real-life datasets and label-scarce settings.Deep learning||Outcome prediction||Process mining||Semi-supervised learningSemi-Supervised Discovery of DNN-Based Outcome Predictors from Scarcely-Labeled Process LogsText/Journal Article10.1007/s12599-022-00749-91867-0202