Logo des Repositoriums
 

YAWN: A Semantically Annotated Wikipedia XML Corpus

dc.contributor.authorSchenkel, Ralf
dc.contributor.authorSuchanek, Fabian
dc.contributor.authorKasneci, Gjergji
dc.contributor.editorKemper, Alfons
dc.contributor.editorSchöning, Harald
dc.contributor.editorRose, Thomas
dc.contributor.editorJarke, Matthias
dc.contributor.editorSeidl, Thomas
dc.contributor.editorQuix, Christoph
dc.contributor.editorBrochhaus, Christoph
dc.date.accessioned2020-02-11T13:22:04Z
dc.date.available2020-02-11T13:22:04Z
dc.date.issued2007
dc.description.abstractThe paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.en
dc.identifier.isbn978-3-88579-197-3
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/31804
dc.language.isoen
dc.publisherGesellschaft für Informatik e. V.
dc.relation.ispartofDatenbanksysteme in Business, Technologie und Web (BTW 2007) – 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS)
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-103
dc.titleYAWN: A Semantically Annotated Wikipedia XML Corpusen
dc.typeText/Conference Paper
gi.citation.endPage291
gi.citation.publisherPlaceBonn
gi.citation.startPage277
gi.conference.date07.-09.03.2007
gi.conference.locationAachen
gi.conference.sessiontitleRegular Research Papers

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
277.pdf
Größe:
780.39 KB
Format:
Adobe Portable Document Format