Logo des Repositoriums
 

Quality Indicators for Text Data

dc.contributor.authorKiefer, Cornelia
dc.contributor.editorMeyer, Holger
dc.contributor.editorRitter, Norbert
dc.contributor.editorThor, Andreas
dc.contributor.editorNicklas, Daniela
dc.contributor.editorHeuer, Andreas
dc.contributor.editorKlettke, Meike
dc.date.accessioned2019-04-15T11:40:30Z
dc.date.available2019-04-15T11:40:30Z
dc.date.issued2019
dc.description.abstractTextual data sets vary in terms of quality. They have different characteristics such as the average sentence length or the amount of spelling mistakes and abbreviations. These text characteristics have influence on the quality of text mining results. They may be measured automatically by means of quality indicators. We present indicators, which we implemented based on natural language processing libraries such as Stanford CoreNLP2 and NLTK3. We discuss design decisions in the implementation of exemplary indicators and provide all indicators on GitHub4. In the evaluation, we investigate free texts from production, news, prose, tweets and chat data and show that the suggested indicators predict the quality of two text mining modules.en
dc.identifier.doi10.18420/btw2019-ws-15
dc.identifier.isbn978-3-88579-684-8
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/21801
dc.language.isoen
dc.publisherGesellschaft für Informatik, Bonn
dc.relation.ispartofBTW 2019 – Workshopband
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) – Proceedings, Volume P-290
dc.subjectdata quality
dc.subjecttext data quality
dc.subjecttext mining
dc.subjecttext analysis
dc.subjectquality indicators for text data
dc.titleQuality Indicators for Text Dataen
gi.citation.endPage154
gi.citation.startPage145
gi.conference.date4.-8. März 2019
gi.conference.locationRostock
gi.conference.sessiontitleWorkshop on Big (and Small) Data in Science and Humanities (BigDS 2019)

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
C2-5.pdf
Größe:
199.5 KB
Format:
Adobe Portable Document Format