Konferenzbeitrag

Detecting plagiarism in text documents through grammar-analysis of authors

Lade...
Vorschaubild
Volltext URI
Dokumententyp
Text/Conference Paper
Datum
2013
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Quelle
Datenbanksysteme für Business, Technologie und Web (BTW) 2028
Regular Research Papers
Verlag
Gesellschaft für Informatik e.V.
Zusammenfassung
The task of intrinsic plagiarism detection is to find plagiarized sections within text documents without using a reference corpus. In this paper, the intrinsic detection approach Plag-Inn is presented which is based on the assumption that authors use a recognizable and distinguishable grammar to construct sentences. The main idea is to analyze the grammar of text documents and to find irregularities within the syntax of sentences, regardless of the usage of concrete words. If suspicious sentences are found by computing the pq-gram distance of grammar trees and by utilizing a Gaussian normal distribution, the algorithm tries to select and combine those sentences into potentially plagiarized sections. The parameters and thresholds needed by the algorithm are optimized by using genetic algorithms. Finally, the approach is evaluated against a large test corpus consisting of English documents, showing promising results.
Beschreibung
Tschuggnall, Michael; Specht, Günther (2013): Detecting plagiarism in text documents through grammar-analysis of authors. Datenbanksysteme für Business, Technologie und Web (BTW) 2028. Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 978-3-88579-608-4. pp. 241-259. Regular Research Papers. Magdeburg. 13.-15. März 2013
Schlagwörter
Zitierform
DOI
Tags