Discovering Data Quality Problems

Zhang, Ruojing; Indulska, Marta; Sadiq, Shazia

Zeitschriftenartikel

Discovering Data Quality Problems

Dokumententyp

Text/Journal Article

Datum

2019

Autor:innen

Zhang, Ruojing

Indulska, Marta

Sadiq, Shazia

Quelle

Business & Information Systems Engineering: Vol. 61, No. 5

Verlag

Springer

Zusammenfassung

Existing methodologies for identifying data quality problems are typically user-centric, where data quality requirements are first determined in a top-down manner following well-established design guidelines, organizational structures and data governance frameworks. In the current data landscape, however, users are often confronted with new, unexplored datasets that they may not have any ownership of, but that are perceived to have relevance and potential to create value for them. Such repurposed datasets can be found in government open data portals, data markets and several publicly available data repositories. In such scenarios, applying top-down data quality checking approaches is not feasible, as the consumers of the data have no control over its creation and governance. Hence, data consumers - data scientists and analysts - need to be empowered with data exploration capabilities that allow them to investigate and understand the quality of such datasets to facilitate well-informed decisions on their use. This research aims to develop such an approach for discovering data quality problems using generic exploratory methods that can be effectively applied in settings where data creation and use is separated. The approach, named LANG, is developed through a Design Science approach on the basis of semiotics theory and data quality dimensions. LANG is empirically validated in terms of soundness of the approach, its repeatability and generalizability.

Zhang, Ruojing; Indulska, Marta; Sadiq, Shazia (2019): Discovering Data Quality Problems. Business & Information Systems Engineering: Vol. 61, No. 5. DOI: 10.1007/s12599-019-00608-0. Springer. PISSN: 1867-0202. pp. 575-593

Schlagwörter

Data quality , Design science , Open data

DOI

10.1007/s12599-019-00608-0

Sammlungen

BISE 61(5) - October 2019

Komplettanzeige

Discovering Data Quality Problems

Volltext URI

Dokumententyp

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen