Logo des Repositoriums
 

Detection of Generated Text Reviews by Leveraging Methods from Authorship Attribution: Predictive Performance vs. Resourcefulness

dc.contributor.authorMoosleitner, Manfred
dc.contributor.authorSpecht, Günther
dc.contributor.authorZangerle, Eva
dc.contributor.editorKönig-Ries, Birgitta
dc.contributor.editorScherzinger, Stefanie
dc.contributor.editorLehner, Wolfgang
dc.contributor.editorVossen, Gottfried
dc.date.accessioned2023-02-23T13:59:45Z
dc.date.available2023-02-23T13:59:45Z
dc.date.issued2023
dc.description.abstractTextual reviews are an integral part of online shopping and a source of information for potential customers. However, a prerequisite is that the reviews are authentic. To this end, pre-trained large language models have been shown to generate convincing text reviews at scale. Therefore, a critical task is the automatic detection of reviews not composed by a human, in a generated review classification task. State-of-the-art approaches to detect generated texts use pre-trained large language models, which exhibit hefty hardware requirements to run and fine-tune the model. Related work has shown that texts generated by language models often show differences in writing style and choice of words compared to texts written by humans. This two properties, which are unique per author, should be able to be utilized to identify if a text is generated by these algorithms. In this paper, we investigate the performance of features prominently used in authorship attribution tasks, using robust classifiers with substantially lower computational resources required. We show that features and methods from authorship attribution can be successfully applied for the task of detecting generated text reviews, leveraging the consistent writing style exhibited by large language models like GPT2. We argue that our approach achieves similar performance as state-of-the-art approaches while providing shorter training times and lower hardware requirements, necessary for, e.g, detection on the fly.en
dc.identifier.doi10.18420/BTW2023-11
dc.identifier.isbn978-3-88579-725-8
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/40315
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofBTW 2023
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-331
dc.subjectText Classification
dc.subjectStylometric Text Features
dc.subjectGenerated Text Detection
dc.titleDetection of Generated Text Reviews by Leveraging Methods from Authorship Attribution: Predictive Performance vs. Resourcefulnessen
dc.typeText/Conference Paper
gi.citation.endPage232
gi.citation.publisherPlaceBonn
gi.citation.startPage221
gi.conference.date06.-10. März 2023
gi.conference.locationDresden, Germany

Dateien

Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
B2-4.pdf
Größe:
287.53 KB
Format:
Adobe Portable Document Format