Classifying documents by distributed P2P clustering

Eisenhardt, MartinMüller, WolfgangHenrich, AndreasDittrich, Klaus R.König, WolfgangOberweis, AndreasRannenberg, KaiWahlster, Wolfgang2019-11-142019-11-1420033-88579-364-4https://dl.gi.de/handle/20.500.12116/29710Clustering documents into classes is an important task in many Information Retrieval (IR) systems. This achieved grouping enables a description of the contents of the document collection in terms of the classes the documents fall into. The compactness of such a description is even more desirable in cases where the document collection is spread across different computers and locations; document classes can then be used to describe each partial document collection in a conveniently short form that can easily be exchanged with other nodes on the network. Unfortunately, most clustering schemes cannot easily be distributed. Additionally, the costs of transferring all data to a central clustering service are prohibitive in large-scale systems. In this paper, we introduce an approach which is capable of classifying documents that are distributed across a Peer-to-Peer (P2P) network. We present measurements taken on a P2P network using synthetic and real-world data sets.enClassifying documents by distributed P2P clusteringText/Conference Paper1617-5468