Classifying documents by distributed P2P clustering
Abstract
Clustering documents into classes is an important task in many Information Retrieval (IR) systems. This achieved grouping enables a description of the contents of the document collection in terms of the classes the documents fall into. The compactness of such a description is even more desirable in cases where the document collection is spread across different computers and locations; document classes can then be used to describe each partial document collection in a conveniently short form that can easily be exchanged with other nodes on the network. Unfortunately, most clustering schemes cannot easily be distributed. Additionally, the costs of transferring all data to a central clustering service are prohibitive in large-scale systems. In this paper, we introduce an approach which is capable of classifying documents that are distributed across a Peer-to-Peer (P2P) network. We present measurements taken on a P2P network using synthetic and real-world data sets.
- Citation
- BibTeX
Eisenhardt, M., Müller, W. & Henrich, A.,
(2003).
Classifying documents by distributed P2P clustering.
In:
Dittrich, K. R., König, W., Oberweis, A., Rannenberg, K. & Wahlster, W.
(Hrsg.),
INFORMATIK 2003 – Innovative Informatikanwendungen, Band 2, Beiträge der 33. Jahrestagung der Gesellschaft für Informatik e.V. (GI).
Bonn:
Gesellschaft für Informatik e.V..
(S. 286-291).
@inproceedings{mci/Eisenhardt2003,
author = {Eisenhardt, Martin AND Müller, Wolfgang AND Henrich, Andreas},
title = {Classifying documents by distributed P2P clustering},
booktitle = {INFORMATIK 2003 – Innovative Informatikanwendungen, Band 2, Beiträge der 33. Jahrestagung der Gesellschaft für Informatik e.V. (GI)},
year = {2003},
editor = {Dittrich, Klaus R. AND König, Wolfgang AND Oberweis, Andreas AND Rannenberg, Kai AND Wahlster, Wolfgang} ,
pages = { 286-291 },
publisher = {Gesellschaft für Informatik e.V.},
address = {Bonn}
}
author = {Eisenhardt, Martin AND Müller, Wolfgang AND Henrich, Andreas},
title = {Classifying documents by distributed P2P clustering},
booktitle = {INFORMATIK 2003 – Innovative Informatikanwendungen, Band 2, Beiträge der 33. Jahrestagung der Gesellschaft für Informatik e.V. (GI)},
year = {2003},
editor = {Dittrich, Klaus R. AND König, Wolfgang AND Oberweis, Andreas AND Rannenberg, Kai AND Wahlster, Wolfgang} ,
pages = { 286-291 },
publisher = {Gesellschaft für Informatik e.V.},
address = {Bonn}
}
Dateien | Groesse | Format | Anzeige | |
---|---|---|---|---|
GI-Proceedings.35-51.pdf | 103.1Kb | View/ |
Haben Sie fehlerhafte Angaben entdeckt? Sagen Sie uns Bescheid: Send Feedback
More Info
ISBN: 3-88579-364-4
ISSN: 1617-5468
xmlui.MetaDataDisplay.field.date: 2003
Language:
(en)

Content Type: Text/Conference Paper