Logo des Repositoriums
 

Improving web page classification by integrating neighboring pages via a topic

dc.contributor.authorSriurai, Wongkot
dc.contributor.authorMeesad, Phayung
dc.contributor.authorHaruechaiyasak, Choochart
dc.contributor.editorEichler, Gerald
dc.contributor.editorKropf, Peter
dc.contributor.editorLechner, Ulrike
dc.contributor.editorMeesad, Phayung
dc.contributor.editorUnger, Herwig
dc.date.accessioned2019-01-11T09:33:32Z
dc.date.available2019-01-11T09:33:32Z
dc.date.issued2010
dc.description.abstractThis paper applies a topic model to represent the feature space for learning the Web page classification model. Latent Dirichlet Allocation (LDA) algorithm is applied to generate a probabilistic topic model consisting of term features clustered into a set of latent topics. Words assigned into the same topic are semantically related. In addition, we propose a method to integrate the additional term features obtained from neighboring pages (i.e., parent and child pages) to further improve the performance of the classification model. In the experiments, we evaluated among three different feature representations: (1) applying the simple BOW model, (2) applying the topic model on current page, and (3) integrating the neighboring pages via the topic model. From the experimental results, the approach of integrating current page with the neighboring pages via the topic model yielded the best performance with the F1 measure of 84.51%; an improvement of 23.31% over the BOW model.en
dc.identifier.isbn978-3-88579-259-8
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/19019
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartof10th International Conferenceon Innovative Internet Community Systems (I2CS) – Jubilee Edition 2010 –
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-165
dc.titleImproving web page classification by integrating neighboring pages via a topicen
dc.typeText/Conference Paper
gi.citation.endPage246
gi.citation.publisherPlaceBonn
gi.citation.startPage238
gi.conference.dateJune 3-5, 2010
gi.conference.locationBangkok, Thailand
gi.conference.sessiontitleRegular Research Papers

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
238.pdf
Größe:
211.39 KB
Format:
Adobe Portable Document Format