Logo des Repositoriums
 

Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking

dc.contributor.authorBiswas, Rajarshi
dc.contributor.authorBarz, Michael
dc.contributor.authorSonntag, Daniel
dc.date.accessioned2021-04-23T09:36:45Z
dc.date.available2021-04-23T09:36:45Z
dc.date.issued2020
dc.description.abstractImage captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning.de
dc.identifier.doi10.1007/s13218-020-00679-2
dc.identifier.pissn1610-1987
dc.identifier.urihttp://dx.doi.org/10.1007/s13218-020-00679-2
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/36327
dc.publisherSpringer
dc.relation.ispartofKI - Künstliche Intelligenz: Vol. 34, No. 4
dc.relation.ispartofseriesKI - Künstliche Intelligenz
dc.subjectBeam search
dc.subjectDeep learning
dc.subjectExplainable artificial intelligence (XAI)
dc.subjectImage captioning
dc.subjectInteractive machine learning (IML)
dc.subjectRe-ranking
dc.subjectVisual explanations
dc.titleTowards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-rankingde
dc.typeText/Journal Article
gi.citation.endPage584
gi.citation.startPage571

Dateien