Logo des Repositoriums
 

Approaches for Automated Data Quality Analysis: Syntactic and Semantic Assessment

dc.contributor.authorAhiagble,Agbodzea Pascal
dc.contributor.authorStein,Hannah
dc.contributor.editorDemmler, Daniel
dc.contributor.editorKrupka, Daniel
dc.contributor.editorFederrath, Hannes
dc.date.accessioned2022-09-28T17:10:57Z
dc.date.available2022-09-28T17:10:57Z
dc.date.issued2022
dc.description.abstractData quality significantly influences data usability and plays an important role in data trading. This paper presents a data quality analysis (DQA) of data tables on two levels. The first, the so-called syntactic level, concerns the structure of the elements within the database and the second, the so-called semantic level, concerns the relationship between the elements in the database and the "real world". Based on a literature review the most relevant data quality criteria and corresponding metrics were derived. Subsequently, based on heuristics, a data-centric approach and an unsupervised machine learning clustering algorithm DBSCAN, a service for automated DQA, is designed and implemented (syntactic DQA). In the next step, an automated semantic DQA service as well. The approach is used to examine data tables for example for missing relevant columns (i.e., semantic completeness). A data quality index represents the services’ output, which is derived from the automated analysis of various data quality criteria. This enables the assessment of data quality, as well as the detection of potentials for improving quality and thus increasing the value of tradeable data.en
dc.identifier.doi10.18420/inf2022_85
dc.identifier.isbn978-3-88579-720-3
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/39592
dc.language.isoen
dc.publisherGesellschaft für Informatik, Bonn
dc.relation.ispartofINFORMATIK 2022
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-326
dc.subjectData quality assessment
dc.subjectdata quality metrics
dc.subjectautomated assessment services
dc.titleApproaches for Automated Data Quality Analysis: Syntactic and Semantic Assessmenten
gi.citation.endPage1036
gi.citation.startPage1023
gi.conference.date26.-30. September 2022
gi.conference.locationHamburg
gi.conference.sessiontitleDatenqualität und Qualitätsmetriken in der Datenwirtschaft (DQ)

Dateien

Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
dq_02.pdf
Größe:
323.15 KB
Format:
Adobe Portable Document Format