Topic detection based on the PageRank's clustering property

This paper introduces a method to cluster graphs of semantically related terms from texts using PageRank calculations for use in the field of text mining, e.g. to automatically discover different topics in a text corpus. It is evaluated by providing empirical results of tests by applying this method on real text corpora. It is shown that this application of the PageRank formula realizes suitable clustering such that the mean similarity between the terms in the clusters reaches a high level. A special state transition in the mean term similarity is discussed when analysing texts with stopwords.

Kubek, Mario; Unger, Herwig (2011): Topic detection based on the PageRank's clustering property. 11th International Conference on Innovative Internet Community Systems (I2CS 2011). Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 978-3-88579-280-2. pp. 139-148. Regular Research Papers. Berlin. June 15-17, 2011

Sammlungen

P186 - I2CS 2011: 11th International Conference on Innovative Internet Community Systems

Komplettanzeige

Topic detection based on the PageRank's clustering property

Volltext URI

Dokumententyp

Dateien

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen