Auflistung nach Autor:in "Kubek, Mario"
1 - 2 von 2
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragDetecting source topics by analysing directed co-occurrence graphs(12th International Conference on Innovative Internet Community Services (I2CS 2012), 2012) Kubek, Mario; Unger, HerwigThis paper describes a new method to determine the sources of topics, that influence the main topics in texts, by analysing directed co-occurrence graphs using an extended version of the HITS algorithm. Additionally, this method can be used to identify characteristic terms in texts. In order to obtain the needed directed term relations the notion of term association is introduced to cover asymmetric reallife relationships between concepts and it is described how they can be calculated by statistical means. In the experiments, it is shown that the detected source topics and the characteristic terms can be used to find similar documents and documents that mainly deal with them in large corpora like the World Wide Web. In doing so iteratively, it is possible to easily follow topics by analysing documents from these corpora using this method. This way, users can be offered this new search function in interactive search systems that goes beyond a simple presentation of similar documents. This application will be elaborated on as well.
- KonferenzbeitragTopic detection based on the PageRank's clustering property(11th International Conference on Innovative Internet Community Systems (I2CS 2011), 2011) Kubek, Mario; Unger, HerwigThis paper introduces a method to cluster graphs of semantically related terms from texts using PageRank calculations for use in the field of text mining, e.g. to automatically discover different topics in a text corpus. It is evaluated by providing empirical results of tests by applying this method on real text corpora. It is shown that this application of the PageRank formula realizes suitable clustering such that the mean similarity between the terms in the clusters reaches a high level. A special state transition in the mean term similarity is discussed when analysing texts with stopwords.