Logo des Repositoriums
 
Konferenzbeitrag

A comparative study on feature weight in thai document categorization

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Text/Conference Paper

Zusatzinformation

Datum

2010

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik e.V.

Zusammenfassung

Text Categorization is the process of automatically assigning predefined categories to free text documents. Feature weighting, which calculates feature (term) values in documents, is one of important preprocessing techniques in text categorization. This paper is a comparative study of feature weighting methods in statistical learning of Thai Document Categorization Framework. Six methods were evaluated, including Boolean, tf, tfxidf, tfc, ltc, and entropy weighting. We have evaluated these methods on Thai news article corpus with three supervised learning classifiers: Support Vector Machine (SVM), Decision Tree (DT), and Naïve Bayes (NB). We found that ltc weighting method is most effective in our experiments with SVM and DT algorithms, while entropy and Boolean weighting is more effective than the weighting with NB algorithms. Using ltc weighting with a SVM classifier yielded a very high classification performance with the F1 measure equal to 96%.

Beschreibung

Chirawichitchai, Nivet; Sa-nguansat, Parinya; Meesad, Phayung (2010): A comparative study on feature weight in thai document categorization. 10th International Conferenceon Innovative Internet Community Systems (I2CS) – Jubilee Edition 2010 –. Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 978-3-88579-259-8. pp. 257-266. Regular Research Papers. Bangkok, Thailand. June 3-5, 2010

Schlagwörter

Zitierform

DOI

Tags