Logo des Repositoriums

Zur Methode automatisierter Kategorisierungsverfahren der Internetsuchmaschine des Bayern-Portals

Vorschaubild nicht verfügbar

Volltext URI


Text/Conference Paper






ISSN der Zeitschrift



Umweltbundesamt http://www.umweltbundesamt.de


Summary For the Bavaria portal, www.bayern.de, a "find machine" is used which covers the data space of public administration and closely related institutions, such as chambers of commerce, universities, employment offices, etc. What makes the find machine special is that it divides up the results lists according to life event. The life events are based on an agreement of a working group of the ministers of the interior at the federal and state levels. The life events are based on the gender mainstreaming approach, which categorizes by requirements and not in relation to a specific organization or a specific field. Overall, 650 life events were used for classification, divided up into citizen life events and corporate life events. The search engine of the Bavaria portal uses new technological methods to develop and process the public web-offerings (government, authorities, municipalities, universities etc.) according to the life events of citizens. The find machine of the Bavaria portal can beside semantic classification categorize automatically using the mathematic-statistic support vector method (SVM). The advantage of the SVM method is enhanced stability of the results when the data space changes over time. Since the approach evaluates the distributions of the word space, it does not depend on the construct of a semantic definition, but instead only on the probability of its appearing. Another method which is increasingly being employed is to use a meta tag related to life event and place which enables a direct classification. The search space of the Bavarian search engine is limited by official offerings, municipalities, institutions of higher learning, employment offices, chambers of commerce, etc. In certain cases, additional institutions such as churches are included in order to take account of the regional supply of kindergartens, for instance. On the whole, approx. 6000 URLs are tracked, with approx. 2½ million indexed sites. The data material is extremely varied. Along with html pages, it also reads and completely indexes pdf and doc formats, among others. This means that such things as municipal bulletins and other reports consisting of several hundred pages are collected and completely indexed. While reports can sometimes be thematically circumscribed section by section, this is often not the case with bulletins such as community magazines, etc. Here it is frequently a question of a chronological or editorial mixture of news, reports and statements, whose content cannot be assigned in detail to the life event classifications.


Weihs, Erich (2008): Zur Methode automatisierter Kategorisierungsverfahren der Internetsuchmaschine des Bayern-Portals. Workshop des Arbeitskreises „Umweltdatenbanken / Umweltinformationssysteme“ der Fachgruppe „Informatik im Umweltschutz“. Dessau-Roßlau: Umweltbundesamt http://www.umweltbundesamt.de. Umweltinformationssysteme: Suchmaschinen und Wissensmanagement - Methoden und Instrumente. Dessau. 2008