On divergence-based author obfuscation: An attack on the state of the art in statistical authorship verification

Bevendorff, Janek; Wenzel, Tobias; Potthast, Martin; Hagen, Matthias; Stein, Benno

On divergence-based author obfuscation: An attack on the state of the art in statistical authorship verification

dc.contributor.author	Bevendorff, Janek
dc.contributor.author	Wenzel, Tobias
dc.contributor.author	Potthast, Martin
dc.contributor.author	Hagen, Matthias
dc.contributor.author	Stein, Benno
dc.date.accessioned	2021-06-21T09:33:46Z
dc.date.available	2021-06-21T09:33:46Z
dc.date.issued	2020
dc.description.abstract	Authorship verification is the task of determining whether two texts were written by the same author based on a writing style analysis. Author obfuscation is the adversarial task of preventing a successful verification by altering a text’s style so that it does not resemble that of its original author anymore. This paper introduces new algorithms for both tasks and reports on a comprehensive evaluation to ascertain the merits of the state of the art in authorship verification to withstand obfuscation. After introducing a new generalization of the well-known unmasking algorithm for short texts, thus completing our collection of state-of-the-art algorithms for verification, we introduce an approach that (1) models writing style difference as the Jensen-Shannon distance between the character n-gram distributions of texts, and (2) manipulates an author’s writing style in a sophisticated manner using heuristic search. For obfuscation, we explore the huge space of textual variants in order to find a paraphrased version of the to-be-obfuscated text that has a sufficiently high Jensen-Shannon distance at minimal costs in terms of text quality loss. We analyze, quantify, and illustrate the rationale of this approach, define paraphrasing operators, derive text length-invariant thresholds for termination, and develop an effective obfuscation framework. Our authorship obfuscation approach defeats the presented state-of-the-art verification approaches, while keeping text changes at a minimum. As a final contribution, we discuss and experimentally evaluate a reverse obfuscation attack against our obfuscation approach as well as possible remedies.	en
dc.identifier.doi	10.1515/itit-2019-0046
dc.identifier.pissn	2196-7032
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/36561
dc.language.iso	en
dc.publisher	De Gruyter
dc.relation.ispartof	it - Information Technology: Vol. 62, No. 2
dc.subject	authorship verification
dc.subject	authorship obfuscation
dc.subject	privacy
dc.subject	computational ethics
dc.title	On divergence-based author obfuscation: An attack on the state of the art in statistical authorship verification	en
dc.type	Text/Journal Article
gi.citation.endPage	115
gi.citation.publisherPlace	Berlin
gi.citation.startPage	99

Sammlungen

it - Information Technology 62(2) - April 2020

On divergence-based author obfuscation: An attack on the state of the art in statistical authorship verification

Dateien

Sammlungen