YAWN: A Semantically Annotated Wikipedia XML Corpus

Schenkel, Ralf; Suchanek, Fabian; Kasneci, Gjergji

YAWN: A Semantically Annotated Wikipedia XML Corpus

dc.contributor.author	Schenkel, Ralf
dc.contributor.author	Suchanek, Fabian
dc.contributor.author	Kasneci, Gjergji
dc.contributor.editor	Kemper, Alfons
dc.contributor.editor	Schöning, Harald
dc.contributor.editor	Rose, Thomas
dc.contributor.editor	Jarke, Matthias
dc.contributor.editor	Seidl, Thomas
dc.contributor.editor	Quix, Christoph
dc.contributor.editor	Brochhaus, Christoph
dc.date.accessioned	2020-02-11T13:22:04Z
dc.date.available	2020-02-11T13:22:04Z
dc.date.issued	2007
dc.description.abstract	The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.	en
dc.identifier.isbn	978-3-88579-197-3
dc.identifier.pissn	1617-5468
dc.identifier.uri	https://dl.gi.de/handle/20.500.12116/31804
dc.language.iso	en
dc.publisher	Gesellschaft für Informatik e. V.
dc.relation.ispartof	Datenbanksysteme in Business, Technologie und Web (BTW 2007) – 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS)
dc.relation.ispartofseries	Lecture Notes in Informatics (LNI) - Proceedings, Volume P-103
dc.title	YAWN: A Semantically Annotated Wikipedia XML Corpus	en
dc.type	Text/Conference Paper
gi.citation.endPage	291
gi.citation.publisherPlace	Bonn
gi.citation.startPage	277
gi.conference.date	07.-09.03.2007
gi.conference.location	Aachen
gi.conference.sessiontitle	Regular Research Papers

Dateien

Originalbündel

1 - 1 von 1

Name:: 277.pdf
Größe:: 780.39 KB
Format:: Adobe Portable Document Format

Herunterladen

Sammlungen

P103 - BTW2007 - Datenbanksysteme in Business, Technologie und Web