Logo des Repositoriums
 
Workshopbeitrag

Merging Community Knowledge and Self-Interest to Build Language Resources: Architecture and Quality Management of a Take-and-Share-Approach of Word Annotations

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Text/Workshop Paper

Zusatzinformation

Datum

2018

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik e.V.

Zusammenfassung

Research data collected in the humanities reveal a tremendous degree of heterogeneity ranging from mere texts in written, spoken, transcribed, or otherwise enriched forms by glosses or handwritten markings, to formal and informal proofs, test series, musical scores, archaeological 3D-models, and e.g. complex multi-layered audio-visual annotated corpus collections. A central argument brought forward in the humanities, is that high quality data is expensive to collect, but easy to exploit by others even if one’s own research is not finished. This paper suggests a possible solution to humanities projects in general and the annotation of words in large text corpora in particular, in which very specific requirements needed impinge on applying standard computational methods ready to use. As a case in point, high quality annotations of texts are time- and resource-intensive and hence expensive. Even if sufficient funds are supplied for manual tagging -- still the gold standard of annotating texts -- it remains an error-prone process, in which quality control soon reaches its limits. In addition, often a very limited number of users are in need of particular annotations required for very particular research questions so that economies of scale and scope of a larger research community could not easily be exploited. This paper addresses this issue taking into account research from Social Psychology and considering the specific properties of texts. As a result of the interdisciplinary analyses, the design of a web architecture is suggested that has the potential of overcoming the above mentioned dilemma and significantly improve the quality of text annotations.

Beschreibung

Peukert, Hagen (2018): Merging Community Knowledge and Self-Interest to Build Language Resources: Architecture and Quality Management of a Take-and-Share-Approach of Word Annotations. INF-DH-2018. DOI: 10.18420/infdh2018-01. Bonn: Gesellschaft für Informatik e.V.. GI-Workshop. Berlin, Germany. 25. September 2018

Zitierform

Tags