Workshopbeitrag
Merging Community Knowledge and Self-Interest to Build Language Resources: Architecture and Quality Management of a Take-and-Share-Approach of Word Annotations
Lade...
Volltext URI
Dokumententyp
Text/Workshop Paper
Zusatzinformation
Datum
2018
Autor:innen
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Quelle
Verlag
Gesellschaft für Informatik e.V.
Zusammenfassung
Research data collected in the humanities reveal a tremendous degree of heterogeneity ranging from mere texts in written, spoken, transcribed, or otherwise enriched forms by glosses or handwritten markings, to formal and informal proofs, test series, musical scores, archaeological 3D-models, and e.g. complex multi-layered audio-visual annotated corpus collections. A central argument brought forward in the humanities, is that high quality data is expensive to collect, but easy to exploit by others even if one’s own research is not finished. This paper suggests a possible solution to humanities projects in general and the annotation
of words in large text corpora in particular, in which very specific requirements needed impinge on applying standard computational methods ready to use.
As a case in point, high quality annotations of texts are time- and resource-intensive and hence expensive. Even if sufficient funds are supplied for manual tagging -- still the gold standard of annotating texts -- it remains an error-prone process, in which quality control soon reaches its limits. In addition, often a very limited number of users are in need of particular annotations required for very
particular research questions so that economies of scale and scope of a larger
research community could not easily be exploited. This paper addresses this issue taking into account research from Social Psychology and considering the specific properties of texts. As a result of the interdisciplinary analyses, the design of a web architecture is suggested that has the potential of overcoming the above mentioned dilemma and significantly improve the quality of text annotations.