Logo des Repositoriums
 

A comparative study on feature weight in thai document categorization

dc.contributor.authorChirawichitchai, Nivet
dc.contributor.authorSa-nguansat, Parinya
dc.contributor.authorMeesad, Phayung
dc.contributor.editorEichler, Gerald
dc.contributor.editorKropf, Peter
dc.contributor.editorLechner, Ulrike
dc.contributor.editorMeesad, Phayung
dc.contributor.editorUnger, Herwig
dc.date.accessioned2019-01-11T09:33:33Z
dc.date.available2019-01-11T09:33:33Z
dc.date.issued2010
dc.description.abstractText Categorization is the process of automatically assigning predefined categories to free text documents. Feature weighting, which calculates feature (term) values in documents, is one of important preprocessing techniques in text categorization. This paper is a comparative study of feature weighting methods in statistical learning of Thai Document Categorization Framework. Six methods were evaluated, including Boolean, tf, tfxidf, tfc, ltc, and entropy weighting. We have evaluated these methods on Thai news article corpus with three supervised learning classifiers: Support Vector Machine (SVM), Decision Tree (DT), and Naïve Bayes (NB). We found that ltc weighting method is most effective in our experiments with SVM and DT algorithms, while entropy and Boolean weighting is more effective than the weighting with NB algorithms. Using ltc weighting with a SVM classifier yielded a very high classification performance with the F1 measure equal to 96%.en
dc.identifier.isbn978-3-88579-259-8
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/19021
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartof10th International Conferenceon Innovative Internet Community Systems (I2CS) – Jubilee Edition 2010 –
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-165
dc.titleA comparative study on feature weight in thai document categorizationen
dc.typeText/Conference Paper
gi.citation.endPage266
gi.citation.publisherPlaceBonn
gi.citation.startPage257
gi.conference.dateJune 3-5, 2010
gi.conference.locationBangkok, Thailand
gi.conference.sessiontitleRegular Research Papers

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
257.pdf
Größe:
263.48 KB
Format:
Adobe Portable Document Format