Improving web page classification by integrating neighboring pages via a topic
dc.contributor.author | Sriurai, Wongkot | |
dc.contributor.author | Meesad, Phayung | |
dc.contributor.author | Haruechaiyasak, Choochart | |
dc.contributor.editor | Eichler, Gerald | |
dc.contributor.editor | Kropf, Peter | |
dc.contributor.editor | Lechner, Ulrike | |
dc.contributor.editor | Meesad, Phayung | |
dc.contributor.editor | Unger, Herwig | |
dc.date.accessioned | 2019-01-11T09:33:32Z | |
dc.date.available | 2019-01-11T09:33:32Z | |
dc.date.issued | 2010 | |
dc.description.abstract | This paper applies a topic model to represent the feature space for learning the Web page classification model. Latent Dirichlet Allocation (LDA) algorithm is applied to generate a probabilistic topic model consisting of term features clustered into a set of latent topics. Words assigned into the same topic are semantically related. In addition, we propose a method to integrate the additional term features obtained from neighboring pages (i.e., parent and child pages) to further improve the performance of the classification model. In the experiments, we evaluated among three different feature representations: (1) applying the simple BOW model, (2) applying the topic model on current page, and (3) integrating the neighboring pages via the topic model. From the experimental results, the approach of integrating current page with the neighboring pages via the topic model yielded the best performance with the F1 measure of 84.51%; an improvement of 23.31% over the BOW model. | en |
dc.identifier.isbn | 978-3-88579-259-8 | |
dc.identifier.pissn | 1617-5468 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/19019 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik e.V. | |
dc.relation.ispartof | 10th International Conferenceon Innovative Internet Community Systems (I2CS) – Jubilee Edition 2010 – | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-165 | |
dc.title | Improving web page classification by integrating neighboring pages via a topic | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 246 | |
gi.citation.publisherPlace | Bonn | |
gi.citation.startPage | 238 | |
gi.conference.date | June 3-5, 2010 | |
gi.conference.location | Bangkok, Thailand | |
gi.conference.sessiontitle | Regular Research Papers |
Dateien
Originalbündel
1 - 1 von 1