Konferenzbeitrag

Selection of representative documents for clusters in a document collection

Lade...
Vorschaubild
Volltext URI
Dokumententyp
Text/Conference Paper
Datum
2003
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Quelle
Natural language processing and information systems
Regular Research Papers
Verlag
Gesellschaft für Informatik e.V.
Zusammenfassung
An efficient way to explore a large document collection (e.g., the search results returned by a search engine) is to subdivide it into clusters of relatively similar documents, to get a general view of the collection and select its parts of particular interest. A way of presenting the clusters to the user is selection of a document in each cluster. For different purposes this can be done in different ways. We consider three cases: selection of the average, the "most typical," and the "least typical" document. The algorithms are given, which rely on a dictionary of keywords reflecting the topic of the user's interest. After clustering, we select a document in each cluster basing on its closeness to the other ones. Different distance measures are discussed; preliminary experimental results are presented. Our approach was implemented in the new version of Document Classifier system.
Beschreibung
Gelbukh, Alexander; Alexandrov, Mikhail; Bourek, Ales; Makagonov, Pavel (2003): Selection of representative documents for clusters in a document collection. Natural language processing and information systems. Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 3-88579-358-X. pp. 120-126. Regular Research Papers. Burg (Spreewald). June 2003
Schlagwörter
Zitierform
DOI
Tags