Auflistung nach Autor:in "Haruechaiyasak, Choochart"
1 - 4 von 4
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragClassifying business types from twitter posts using active learning(10th International Conferenceon Innovative Internet Community Systems (I2CS) – Jubilee Edition 2010 –, 2010) Thongsuk, Chanattha; Haruechaiyasak, Choochart; Meesad, PhayungToday, many companies have adopted Twitter as an additional marketing medium to advertise and promote their business activities. One possible solution for organizing a large number of posts is to classify them into a predefined category of business types. Applying normal text categorization technique on Twitter is ineffective due to the short-length (140-character limit) characteristic of each post and a large number of unlabeled data. In this paper, we propose a text categorization approach based on the active learning technique for classifying Twitter posts into three business types, i.e., airline, food and computer & technology. By applying the active learning, we started by constructing an initial text categorization model from a small set of labelled data. Using this text categorization model, we obtain more positive data instances for constructing a new model by selecting the test data which are predicted as positive. As shown from the experimental results, our proposed approach based on active learning helped increase the classification accuracy over the normal text categorization approach.
- KonferenzbeitragImproving ASR for continuous thai words using ANN/HMM(10th International Conferenceon Innovative Internet Community Systems (I2CS) – Jubilee Edition 2010 –, 2010) Sodanil, Maleerat; Nitsuwat, Supot; Haruechaiyasak, ChoochartThe baseline system of an automatic speech recognition normally uses Mel- Frequency Cepstral Coefficients (MFCC) as feature vectors. However, for tonal language like Thai, tone information is one of the important features which can be used to improve the accuracy of recognition. This paper proposes a method for building an acoustic model for Thai-ASR using a combination of MFCC and tone information as an input feature vector. In addition, we apply Artificial Neural Network (ANN) multilayer perceptrons to estimate the posterior probabilities of a class model given a sequence of observation input. The performance of the ANN approach is compared with the Gaussian Mixture Model (GMM) used in the Hidden Markov Model Toolkit (HTK). The experiments were carried out with 2-grams and 3-grams of language model. The training and test data sets were prepared from reading speech of ten Aesop's stories from 5 male and 5 female speakers. The results showed that the proposed method can be used to improve the performance of Thai-ASR in term of reducing word error rate.
- KonferenzbeitragImproving web page classification by integrating neighboring pages via a topic(10th International Conferenceon Innovative Internet Community Systems (I2CS) – Jubilee Edition 2010 –, 2010) Sriurai, Wongkot; Meesad, Phayung; Haruechaiyasak, ChoochartThis paper applies a topic model to represent the feature space for learning the Web page classification model. Latent Dirichlet Allocation (LDA) algorithm is applied to generate a probabilistic topic model consisting of term features clustered into a set of latent topics. Words assigned into the same topic are semantically related. In addition, we propose a method to integrate the additional term features obtained from neighboring pages (i.e., parent and child pages) to further improve the performance of the classification model. In the experiments, we evaluated among three different feature representations: (1) applying the simple BOW model, (2) applying the topic model on current page, and (3) integrating the neighboring pages via the topic model. From the experimental results, the approach of integrating current page with the neighboring pages via the topic model yielded the best performance with the F1 measure of 84.51%; an improvement of 23.31% over the BOW model.
- KonferenzbeitragRecommending related articles in wikipedia via a topic-based model(9th International Conference On Innovative Internet Community Systems I2CS 2024, 2009) Sriurai, Wongkot; Meesad, Phayung; Haruechaiyasak, ChoochartWikipedia is currently the largest encyclopedia publicly available on the Web. In addition to keyword search and subject browsing, users may quickly access articles by following hyperlinks embedded within each article. The main drawback of this method is that some links to related articles could be missing from the current article. Also, a related article could not be inserted as a hyperlink if there is no term describing it within the current article. In this paper, we propose an approach for recommending related articles based on the Latent Dirichlet Allocation (LDA) algorithm. By applying the LDA on the anchor texts from each article, a set of diverse topics could be generated. An article can be represented as a probability distribution over this topic set. Two articles with similar topic distributions are considered conceptually related. We performed an experiment on the Wikipedia Selection for Schools which is a collection of 4,625 selected articles from the Wikipedia. Based on some initial evaluation, our proposed method could generate a set of recommended articles which are more relevant than the linked articles given on the test articles.