Logo des Repositoriums
 
Konferenzbeitrag

On the impact of document representation on classifier performance in e-mail categorization

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Text/Conference Paper

Zusatzinformation

Datum

2005

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik e.V.

Zusammenfassung

This paper provides an analysis of multi-class e-mail categorization performance. In order to investigate this issue, the quality of various classification algorithms based on two distinct document representation formalisms is compared. In particular, both a standard word-based document representation as well as a character n-gram document representation is used. The latter is regarded as highly noise-tolerant and was originally proposed for automatic language identification and as a convenient means for producing compact document indices. Furthermore the impact of using available e-mail specific meta-information on classification performance is explored and the findings are presented.

Beschreibung

Berger, Helmut; Köhle, Monika; Merkl, Dieter (2005): On the impact of document representation on classifier performance in e-mail categorization. Information systems technology and its applications, ISTA' 2005. Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 3-88579-392-X. pp. 19-30. Regular Research Papers. Palmerston North, New Zealand. 23.-25. May 2005

Schlagwörter

Zitierform

DOI

Tags