Face verification explainability heatmap generation using a vision transformer

Ricardo Correia, Paulo L CorreiaDamer, NaserGomez-Barrero, MartaRaja, KiranRathgeb, ChristianSequeira, Ana F.Todisco, MassimilianoUhl, Andreas2023-12-122023-12-122023978-3-88579-733-31617-5468https://dl.gi.de/handle/20.500.12116/43270Explainable Face Recognition (XFR) is a critical technology to support the large deployment of learning-based face recognition solutions. This paper aims at contributing to the more transparent usage of Vision Transformers (ViTs) for face verification (FV) tasks, by proposing a novel approach for generating FV explainability heatmaps, for both positive and negative decisions. The proposed solution leverages on the attention maps generated by a ViT and employs masking techniques to create masks based on the highlighted regions in the attention maps. These masks are applied to the pair of faces, and the masking technique with most impact on the decision is selected to be used to generate heatmaps for the probe-gallery pair of faces. These heatmaps offer valuable insights into the decision-making process, shedding light on the most important face regions for the verification outcome. The key novelty of this paper lies in the proposed approach for generating explainability heatmaps tailored for verification pairs in the context of ViT models, which combines the ViT attention maps regions of the probe-gallery pair to create masks that allow evaluating those region´s impact on the verification decision for both positive and negative decisions.enTrustworthiness and explainabilityFace and gesture recognitionFace verification explainability heatmap generation using a vision transformerText/Conference Paper