Web Usage Analysis for Assessing and Improving Metadata Quality
Vorschaubild nicht verfügbar
ISSN der Zeitschrift
Due to Directive 2003/4/EC on public access to environmental information public authorities have invested large amounts of money and time in metadata management projects. For instance, in Germany the Environmental Data Catalogue was designed as a Web-based meta information system for retrieving environmental information held by public authorities. The benefits of the system depend on the quality of the metadata provided. As well-known data quality is expressed in several dimensions, e.g. accessibility, completeness, relevancy, timeliness etc.. To guarantee a high metadata quality public authorities have to develop a long-term metadata management strategy concerning the Environmental Data Catalogue. At a tactical level this requires a continuous monitoring of the usage of the Environmental Data Catalogue. One attractive strategy for analyzing the usage of the system is the analysis of log files provided by the Environmental Data Catalogue. But to our knowledge only a few German authorities, e.g. the State Institute for Environment, Measurement and Nature Conservation Baden Württemberg, have started to analyse the UDK-log files continuously in a systematic way. The reasons are: (a) The UDK-software only provides rudimentary support for managing the logging process. (b) It is difficult to analyse the log files because there are no metrics and no tool for analyzing the application-specific log files of the Environmental Data Catalogue. Due to these facts in our contribution we present procedures, metrics and a tool for analysing the specific UDK-log files. The UDK Log-File-Analyser receives the daily log files as input, aggregates and analyses the data and provides a summary report, a weekly report, an error report and a search term evaluation report as result. The reports list several metrics (e.g. number of sessions and object views, list of most popular UDK-objects, list of most popular search terms etc.). The statistics are presented in tabular and graphical form. Using our tool we analysed the daily log files of the Environmental Data Catalogue of the German State of Baden-Württemberg from October 1st, 2003 to October 31st, 2004. Thereby we identified several data quality problems with respect to content and accessibility. To solve these problems, we derived several activities from our empirical study. For instance based on the search term evaluation report produced by our tool the most popular keywords can be identified. Assigning these keywords to relevant UDK-objects improves the accessibility of the data. In addition our tool provides a list of search terms for which the Environmental Data Catalogue has not delivered relevant objects. This list can help identify missing objects. Adding these objects improves the completeness of data. In summary, with our tool and metrics public authorities have instruments for guiding their meta data management and for improving metadata quality concerning the Environmental Data Catalogue. Moreover, our results show that web usage analysis is a promising way to gain insights into the environmental information demand of the public.