Konferenzbeitrag
On the impact of document representation on classifier performance in e-mail categorization
Lade...
Volltext URI
Dokumententyp
Text/Conference Paper
Datum
2005
Autor:innen
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Quelle
Information systems technology and its applications, ISTA' 2005
Regular Research Papers
Verlag
Gesellschaft für Informatik e.V.
Zusammenfassung
This paper provides an analysis of multi-class e-mail categorization performance. In order to investigate this issue, the quality of various classification algorithms based on two distinct document representation formalisms is compared. In particular, both a standard word-based document representation as well as a character n-gram document representation is used. The latter is regarded as highly noise-tolerant and was originally proposed for automatic language identification and as a convenient means for producing compact document indices. Furthermore the impact of using available e-mail specific meta-information on classification performance is explored and the findings are presented.