Assessing the Human Ability to Recognize Synthetic Speech in Ordinary Conversation

Daniel Prudký, Anton FircDamer, NaserGomez-Barrero, MartaRaja, KiranRathgeb, ChristianSequeira, Ana F.Todisco, MassimilianoUhl, Andreas2023-12-122023-12-122023978-3-88579-733-31617-5468https://dl.gi.de/handle/20.500.12116/43286This work assesses the human ability to recognize synthetic speech (deepfake). This paper describes an experiment in which we communicated with respondents using voice messages. We presented the respondents with a cover story about testing the user-friendliness of voice messages while secretly sending them a pre-prepared deepfake recording during the conversation. We examined their reactions, knowledge of deepfakes, or how many could correctly identify which message was deepfake. The results show that none of the respondents reacted in any way to the fraudulent deepfake message, and only one retrospectively admitted to noticing something specific. On the other hand, a voicemail message that contained a deepfake was correctly identified by 83.9% of respondents after revealing the nature of the experiment. Thus, the results show that although the deepfake recording was clearly identifiable among others, no one reacted to it. In summary, we show that the human ability to recognize voice deepfakes is not at a level we can trust. It is very difficult for people to distinguish between real and fake voices, especially if they do not expect them.enSpeech and speaker recognitionUsabilityAssessing the Human Ability to Recognize Synthetic Speech in Ordinary ConversationText/Conference Paper