Sriurai, WongkotMeesad, PhayungHaruechaiyasak, ChoochartEichler, GeraldKropf, PeterLechner, UlrikeMeesad, PhayungUnger, Herwig2019-01-112019-01-112010978-3-88579-259-8https://dl.gi.de/handle/20.500.12116/19019This paper applies a topic model to represent the feature space for learning the Web page classification model. Latent Dirichlet Allocation (LDA) algorithm is applied to generate a probabilistic topic model consisting of term features clustered into a set of latent topics. Words assigned into the same topic are semantically related. In addition, we propose a method to integrate the additional term features obtained from neighboring pages (i.e., parent and child pages) to further improve the performance of the classification model. In the experiments, we evaluated among three different feature representations: (1) applying the simple BOW model, (2) applying the topic model on current page, and (3) integrating the neighboring pages via the topic model. From the experimental results, the approach of integrating current page with the neighboring pages via the topic model yielded the best performance with the F1 measure of 84.51%; an improvement of 23.31% over the BOW model.enImproving web page classification by integrating neighboring pages via a topicText/Conference Paper1617-5468