Selection of representative documents for clusters in a document collection
dc.contributor.author | Gelbukh, Alexander | |
dc.contributor.author | Alexandrov, Mikhail | |
dc.contributor.author | Bourek, Ales | |
dc.contributor.author | Makagonov, Pavel | |
dc.contributor.editor | Düsterhöft, Antje | |
dc.contributor.editor | Thalheim, Bernhard | |
dc.date.accessioned | 2019-11-14T11:21:40Z | |
dc.date.available | 2019-11-14T11:21:40Z | |
dc.date.issued | 2003 | |
dc.description.abstract | An efficient way to explore a large document collection (e.g., the search results returned by a search engine) is to subdivide it into clusters of relatively similar documents, to get a general view of the collection and select its parts of particular interest. A way of presenting the clusters to the user is selection of a document in each cluster. For different purposes this can be done in different ways. We consider three cases: selection of the average, the "most typical," and the "least typical" document. The algorithms are given, which rely on a dictionary of keywords reflecting the topic of the user's interest. After clustering, we select a document in each cluster basing on its closeness to the other ones. Different distance measures are discussed; preliminary experimental results are presented. Our approach was implemented in the new version of Document Classifier system. | en |
dc.identifier.isbn | 3-88579-358-X | |
dc.identifier.pissn | 1617-5468 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/29880 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik e.V. | |
dc.relation.ispartof | Natural language processing and information systems | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-29 | |
dc.title | Selection of representative documents for clusters in a document collection | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 126 | |
gi.citation.publisherPlace | Bonn | |
gi.citation.startPage | 120 | |
gi.conference.date | June 2003 | |
gi.conference.location | Burg (Spreewald) | |
gi.conference.sessiontitle | Regular Research Papers |
Dateien
Originalbündel
1 - 1 von 1
Lade...
- Name:
- GI-Proceedings.29-10.pdf
- Größe:
- 126.27 KB
- Format:
- Adobe Portable Document Format