Detection of Generated Text Reviews by Leveraging Methods from Authorship Attribution: Predictive Performance vs. Resourcefulness

Moosleitner, Manfred; Specht, Günther; Zangerle, Eva

Detection of Generated Text Reviews by Leveraging Methods from Authorship Attribution: Predictive Performance vs. Resourcefulness

dc.contributor.author	Moosleitner, Manfred
dc.contributor.author	Specht, Günther
dc.contributor.author	Zangerle, Eva
dc.contributor.editor	König-Ries, Birgitta
dc.contributor.editor	Scherzinger, Stefanie
dc.contributor.editor	Lehner, Wolfgang
dc.contributor.editor	Vossen, Gottfried
dc.date.accessioned	2023-02-23T13:59:45Z
dc.date.available	2023-02-23T13:59:45Z
dc.date.issued	2023
dc.description.abstract	Textual reviews are an integral part of online shopping and a source of information for potential customers. However, a prerequisite is that the reviews are authentic. To this end, pre-trained large language models have been shown to generate convincing text reviews at scale. Therefore, a critical task is the automatic detection of reviews not composed by a human, in a generated review classification task. State-of-the-art approaches to detect generated texts use pre-trained large language models, which exhibit hefty hardware requirements to run and fine-tune the model. Related work has shown that texts generated by language models often show differences in writing style and choice of words compared to texts written by humans. This two properties, which are unique per author, should be able to be utilized to identify if a text is generated by these algorithms. In this paper, we investigate the performance of features prominently used in authorship attribution tasks, using robust classifiers with substantially lower computational resources required. We show that features and methods from authorship attribution can be successfully applied for the task of detecting generated text reviews, leveraging the consistent writing style exhibited by large language models like GPT2. We argue that our approach achieves similar performance as state-of-the-art approaches while providing shorter training times and lower hardware requirements, necessary for, e.g, detection on the fly.	en
dc.identifier.doi	10.18420/BTW2023-11
dc.identifier.isbn	978-3-88579-725-8
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/40315
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik e.V.
dc.relation.ispartof	BTW 2023
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-331
dc.subject	Text Classification
dc.subject	Stylometric Text Features
dc.subject	Generated Text Detection
dc.title	Detection of Generated Text Reviews by Leveraging Methods from Authorship Attribution: Predictive Performance vs. Resourcefulness	en
dc.type	Text/Conference Paper
gi.citation.endPage	232
gi.citation.publisherPlace	Bonn
gi.citation.startPage	221
gi.conference.date	06.-10. März 2023
gi.conference.location	Dresden, Germany

Dateien

Originalbündel

1 - 1 von 1

Name:: B2-4.pdf
Größe:: 287.53 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P331 - BTW2023- Datenbanksysteme für Business, Technologie und Web