Auflistung nach Autor:in "Sriurai, Wongkot"
1 - 2 von 2
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragImproving web page classification by integrating neighboring pages via a topic(10th International Conferenceon Innovative Internet Community Systems (I2CS) – Jubilee Edition 2010 –, 2010) Sriurai, Wongkot; Meesad, Phayung; Haruechaiyasak, ChoochartThis paper applies a topic model to represent the feature space for learning the Web page classification model. Latent Dirichlet Allocation (LDA) algorithm is applied to generate a probabilistic topic model consisting of term features clustered into a set of latent topics. Words assigned into the same topic are semantically related. In addition, we propose a method to integrate the additional term features obtained from neighboring pages (i.e., parent and child pages) to further improve the performance of the classification model. In the experiments, we evaluated among three different feature representations: (1) applying the simple BOW model, (2) applying the topic model on current page, and (3) integrating the neighboring pages via the topic model. From the experimental results, the approach of integrating current page with the neighboring pages via the topic model yielded the best performance with the F1 measure of 84.51%; an improvement of 23.31% over the BOW model.
- KonferenzbeitragRecommending related articles in wikipedia via a topic-based model(9th International Conference On Innovative Internet Community Systems I2CS 2024, 2009) Sriurai, Wongkot; Meesad, Phayung; Haruechaiyasak, ChoochartWikipedia is currently the largest encyclopedia publicly available on the Web. In addition to keyword search and subject browsing, users may quickly access articles by following hyperlinks embedded within each article. The main drawback of this method is that some links to related articles could be missing from the current article. Also, a related article could not be inserted as a hyperlink if there is no term describing it within the current article. In this paper, we propose an approach for recommending related articles based on the Latent Dirichlet Allocation (LDA) algorithm. By applying the LDA on the anchor texts from each article, a set of diverse topics could be generated. An article can be represented as a probability distribution over this topic set. Two articles with similar topic distributions are considered conceptually related. We performed an experiment on the Wikipedia Selection for Schools which is a collection of 4,625 selected articles from the Wikipedia. Based on some initial evaluation, our proposed method could generate a set of recommended articles which are more relevant than the linked articles given on the test articles.