Using neighborhood information for automated categorization of web pages
ISSN der Zeitschrift
Regular Research Papers
Gesellschaft für Informatik e.V.
In this paper we discuss several issues related to the influence of expansion of a Web document representation on quality of topical categorization of Web pages. We consider a Web page expansion by using text content of it's linking pages. We show that naive expansion can grab too much noise and essentially harm categorization results. We present the approach to automated pruning of linking Web pages. We report that using our approach in forming a Web page representation always leads to better results than traditional single Web page categorization.