Identifying Information with Environmental Relevance in Web Sites of the Public Administration Using Automatic Classification
Vorschaubild nicht verfügbar
ISSN der Zeitschrift
Innovations in Sharing Environmental Observations and Information
Abstract The German environmental information portal PortalU is a publicly financed information infrastructure. It offers a single point of entry for searches on all kinds of information with environmental relevance held by public authorities. From the point of view of the query facility in PortalU, all information sources can be treated uniformly based on the common structure of the search results. One of the key problems in PortalU is how to ensure the relevance of data that is available in a distributed index structure. Whereas some data sources, e.g. metadata catalogues, typically contain only highly relevant material with respect to the content model of PortalU, other data sources like Web indexes tend to contain more heterogeneous material. In this paper we describe how automatic classification of Web pages can be used in order to provide information with a high degree of environmental relevance.