On divergence-based author obfuscation: An attack on the state of the art in statistical authorship verification
dc.contributor.author | Bevendorff, Janek | |
dc.contributor.author | Wenzel, Tobias | |
dc.contributor.author | Potthast, Martin | |
dc.contributor.author | Hagen, Matthias | |
dc.contributor.author | Stein, Benno | |
dc.date.accessioned | 2021-06-21T09:33:46Z | |
dc.date.available | 2021-06-21T09:33:46Z | |
dc.date.issued | 2020 | |
dc.description.abstract | Authorship verification is the task of determining whether two texts were written by the same author based on a writing style analysis. Author obfuscation is the adversarial task of preventing a successful verification by altering a text’s style so that it does not resemble that of its original author anymore. This paper introduces new algorithms for both tasks and reports on a comprehensive evaluation to ascertain the merits of the state of the art in authorship verification to withstand obfuscation. After introducing a new generalization of the well-known unmasking algorithm for short texts, thus completing our collection of state-of-the-art algorithms for verification, we introduce an approach that (1) models writing style difference as the Jensen-Shannon distance between the character n-gram distributions of texts, and (2) manipulates an author’s writing style in a sophisticated manner using heuristic search. For obfuscation, we explore the huge space of textual variants in order to find a paraphrased version of the to-be-obfuscated text that has a sufficiently high Jensen-Shannon distance at minimal costs in terms of text quality loss. We analyze, quantify, and illustrate the rationale of this approach, define paraphrasing operators, derive text length-invariant thresholds for termination, and develop an effective obfuscation framework. Our authorship obfuscation approach defeats the presented state-of-the-art verification approaches, while keeping text changes at a minimum. As a final contribution, we discuss and experimentally evaluate a reverse obfuscation attack against our obfuscation approach as well as possible remedies. | en |
dc.identifier.doi | 10.1515/itit-2019-0046 | |
dc.identifier.pissn | 2196-7032 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/36561 | |
dc.language.iso | en | |
dc.publisher | De Gruyter | |
dc.relation.ispartof | it - Information Technology: Vol. 62, No. 2 | |
dc.subject | authorship verification | |
dc.subject | authorship obfuscation | |
dc.subject | privacy | |
dc.subject | computational ethics | |
dc.title | On divergence-based author obfuscation: An attack on the state of the art in statistical authorship verification | en |
dc.type | Text/Journal Article | |
gi.citation.endPage | 115 | |
gi.citation.publisherPlace | Berlin | |
gi.citation.startPage | 99 |