Logo des Repositoriums
 

On divergence-based author obfuscation: An attack on the state of the art in statistical authorship verification

dc.contributor.authorBevendorff, Janek
dc.contributor.authorWenzel, Tobias
dc.contributor.authorPotthast, Martin
dc.contributor.authorHagen, Matthias
dc.contributor.authorStein, Benno
dc.date.accessioned2021-06-21T09:33:46Z
dc.date.available2021-06-21T09:33:46Z
dc.date.issued2020
dc.description.abstractAuthorship verification is the task of determining whether two texts were written by the same author based on a writing style analysis. Author obfuscation is the adversarial task of preventing a successful verification by altering a text’s style so that it does not resemble that of its original author anymore. This paper introduces new algorithms for both tasks and reports on a comprehensive evaluation to ascertain the merits of the state of the art in authorship verification to withstand obfuscation. After introducing a new generalization of the well-known unmasking algorithm for short texts, thus completing our collection of state-of-the-art algorithms for verification, we introduce an approach that (1) models writing style difference as the Jensen-Shannon distance between the character n-gram distributions of texts, and (2) manipulates an author’s writing style in a sophisticated manner using heuristic search. For obfuscation, we explore the huge space of textual variants in order to find a paraphrased version of the to-be-obfuscated text that has a sufficiently high Jensen-Shannon distance at minimal costs in terms of text quality loss. We analyze, quantify, and illustrate the rationale of this approach, define paraphrasing operators, derive text length-invariant thresholds for termination, and develop an effective obfuscation framework. Our authorship obfuscation approach defeats the presented state-of-the-art verification approaches, while keeping text changes at a minimum. As a final contribution, we discuss and experimentally evaluate a reverse obfuscation attack against our obfuscation approach as well as possible remedies.en
dc.identifier.doi10.1515/itit-2019-0046
dc.identifier.pissn2196-7032
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/36561
dc.language.isoen
dc.publisherDe Gruyter
dc.relation.ispartofit - Information Technology: Vol. 62, No. 2
dc.subjectauthorship verification
dc.subjectauthorship obfuscation
dc.subjectprivacy
dc.subjectcomputational ethics
dc.titleOn divergence-based author obfuscation: An attack on the state of the art in statistical authorship verificationen
dc.typeText/Journal Article
gi.citation.endPage115
gi.citation.publisherPlaceBerlin
gi.citation.startPage99

Dateien