YAWN: A Semantically Annotated Wikipedia XML Corpus

The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.

Schenkel, Ralf; Suchanek, Fabian; Kasneci, Gjergji (2007): YAWN: A Semantically Annotated Wikipedia XML Corpus. Datenbanksysteme in Business, Technologie und Web (BTW 2007) – 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS). Bonn: Gesellschaft für Informatik e. V.. PISSN: 1617-5468. ISBN: 978-3-88579-197-3. pp. 277-291. Regular Research Papers. Aachen. 07.-09.03.2007

Sammlungen

P103 - BTW2007 - Datenbanksysteme in Business, Technologie und Web

Komplettanzeige

YAWN: A Semantically Annotated Wikipedia XML Corpus

Volltext URI

Dokumententyp

Dateien

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen