Auflistung nach Schlagwort "Data quality"
1 - 9 von 9
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelCapturing Enterprise Data Integration Challenges Using a Semiotic Data Quality Framework(Business & Information Systems Engineering: Vol. 57, No. 1, 2015) Krogstie, JohnEnterprises have a large amount of data available, represented in different formats normally accessible for different specialists through different tools. Integrating existing data, also those from more informal sources, can have great business value when used together as discussed for instance in connection to big data. On the other hand, the level of integration and exploitation will depend both on the data quality of the sources to be integrated, and on how data quality of the different sources matches. Whereas data quality frameworks often consist of unstructured list of characteristics, here a framework is used which has been traditionally applied for enterprise and business model quality, with the data quality characteristics structured relative to semiotic levels, which makes it easier to compare aspects in order to find opportunities and challenges for data integration. A case study presenting the practical application of the framework illustrates the usefulness of the approach for this purpose. This approach reveals opportunities, but also challenges when trying to integrate data from different data sources typically used by people in different roles in an organization.
- ZeitschriftenartikelData-Centric Artificial Intelligence(Business & Information Systems Engineering: Vol. 66, No. 4, 2024) Jakubik, Johannes; Vössing, Michael; Kühl, Niklas; Walk, Jannis; Satzger, GerhardData-centric artificial intelligence (data-centric AI) represents an emerging paradigm that emphasizes the importance of enhancing data systematically and at scale to build effective and efficient AI-based systems. The novel paradigm complements recent model-centric AI, which focuses on improving the performance of AI-based systems based on changes in the model using a fixed set of data. The objective of this article is to introduce practitioners and researchers from the field of Business and Information Systems Engineering (BISE) to data-centric AI. The paper defines relevant terms, provides key characteristics to contrast the paradigm of data-centric AI with the model-centric one, and introduces a framework to illustrate the different dimensions of data-centric AI. In addition, an overview of available tools for data-centric AI is presented and this novel paradigm is differenciated from related concepts. Finally, the paper discusses the longer-term implications of data-centric AI for the BISE community.
- KonferenzbeitragDeep Learning Datasets Challenges For Semantic Segmentation - A Survey(INFORMATIK 2023 - Designing Futures: Zukünfte gestalten, 2023) Ponciano, Claire; Schaffert, Markus; Ponciano, Jean-JacquesThis survey offers a comprehensive analysis of challenges encountered when employing large-scale datasets for deep learning-based semantic segmentation, an area with significant implica- tions for industries such as autonomous driving, precision agriculture, and medical imaging. Through a systematic review of 94 papers from Papers with Code, we identified 32 substantial challenges, which we categorized into six key areas: Data Quality and Quantity, Data Preprocessing, Resource Constraints, Data Management and Privacy, Generalization, and Data Compatibility. By identifying and explicating these challenges, our research provides a crucial reference point for future studies aiming to address these issues and enhance the performance of deep learning models for semantic segmentation. Future work will focus on leveraging AI and semantic technologies to provide solutions to these challenges.
- ConferencePaperDetecting Quality Problems in Research Data: A Model-Driven Approach(Software Engineering 2021, 2021) Kesper, Arno; Wenz, Viola; Taentzer, GabrieleThe quality of research data is essential for scientific progress. A major challenge in data quality assurance is the localisation of quality problems that are inherent to data. Based on the observation of a dynamic shift in the database technologies employed, we present a model-driven approach to analyse the quality of research data. It allows a data engineer to formulate anti-patterns that are generic concerning the database format and technology. A domain expert chooses a pattern that has been adapted to a specific database technology and concretises it for a domain-specific database format. The resulting concrete pattern is used by a data analyst to locate quality problems in the database. As a proof of concept, we implemented tool support that realises this approach for XML databases. We evaluated our approach concerning expressiveness and performance. The original paper has been published at the International Conference on Model Driven Engineering Languages and Systems 2020.
- ZeitschriftenartikelDiscovering Data Quality Problems(Business & Information Systems Engineering: Vol. 61, No. 5, 2019) Zhang, Ruojing; Indulska, Marta; Sadiq, ShaziaExisting methodologies for identifying data quality problems are typically user-centric, where data quality requirements are first determined in a top-down manner following well-established design guidelines, organizational structures and data governance frameworks. In the current data landscape, however, users are often confronted with new, unexplored datasets that they may not have any ownership of, but that are perceived to have relevance and potential to create value for them. Such repurposed datasets can be found in government open data portals, data markets and several publicly available data repositories. In such scenarios, applying top-down data quality checking approaches is not feasible, as the consumers of the data have no control over its creation and governance. Hence, data consumers - data scientists and analysts - need to be empowered with data exploration capabilities that allow them to investigate and understand the quality of such datasets to facilitate well-informed decisions on their use. This research aims to develop such an approach for discovering data quality problems using generic exploratory methods that can be effectively applied in settings where data creation and use is separated. The approach, named LANG, is developed through a Design Science approach on the basis of semiotics theory and data quality dimensions. LANG is empirically validated in terms of soundness of the approach, its repeatability and generalizability.
- ZeitschriftenartikelModel-based Analysis of Data Inaccuracy Awareness in Business Processes(Business & Information Systems Engineering: Vol. 64, No. 2, 2022) Evron, Yotam; Soffer, Pnina; Zamansky, AnnaProblem definition: Data errors in business processes can be a source for exceptions and hamper business outcomes. Relevance: The paper proposes a method for analyzing data inaccuracy issues already at process design time, in order to support process designers by identifying process parts where data errors might remain unrecognized, so decisions could be taken based on inaccurate data. Methodology: The paper follows design science, developing a method as an artifact. The conceptual basis is the notion of data inaccuracy awareness – the ability to tell whether potential discrepancies between real and IS values may exist. Results: The method was implemented on top of a Petri net modeling tool and validated in a case study performed in a large manufacturing company of safety–critical systems. Managerial implications: Anticipating consequences of data inaccuracy already during process design can help avoiding them at runtime.
- ZeitschriftenartikelProzessgetriebenes Datenqualitätsmanagement durch Integration von Datenqualität in bestehende Prozessmodelle(Wirtschaftsinformatik: Vol. 55, No. 6, 2013) Glowalla, Paul; Sunyaev, AliDie Bedeutung einer hohen Datenqualität und die Notwendigkeit von Datenqualität im Kontext von Geschäftsprozessen sind allgemein anerkannt. Prozessmodellierung ist für prozessgetriebenes Datenqualitätsmanagement erforderlich, welches die Datenqualität durch Neugestaltung von Prozessen zur Sammlung oder Änderung von Daten zu erhalten und zu verbessern sucht. Es existiert eine Vielzahl von Modellierungssprachen, welche von Unternehmen unterschiedlich angewendet werden. Der Zweck dieses Artikels ist es, einen kontextunabhängigen Ansatz vorzustellen, um Datenqualität in die Vielfalt der existierenden Prozessmodelle zu integrieren. Die Kommunikation der Datenqualität zwischen Stakeholdern soll unter Berücksichtigung der Prozessmodellkomplexität verbessert werden. Es wurde eine schlagwortbasierte Literaturrecherche in 74 IS-Zeitschriften und drei Konferenzen durchgeführt, in der 1.555 Artikel von 1995 an gesichtet wurden. 26 Artikel, darunter 46 Prozessmodelle, wurden im Detail untersucht. Die Literaturrecherche zeigt die Notwendigkeit einer kontextunabhängigen und sichtbaren Integration von Datenqualität in Prozessmodelle. Zunächst wird die Integration innerhalb eines Modells aufgezeigt. Dann folgt die Integration datenqualitätsorientierter Prozessmodelle mit anderen existierenden Prozessmodellen. Da Prozessmodelle hauptsächlich zur Kommunikation von Prozessen genutzt werden, werden der Einfluss der Integration von Datenqualität und die Anwendung von Mustern zur Komplexitätsreduktion sowie die Auswirkung auf die Komplexitätsmetriken des Modells betrachtet. Es bedarf weiterer Forschung zu Komplexitätsmetriken, um die Anwendbarkeit von Komplexitätsreduktionsmustern zu verbessern. Fehlende Kenntnisse über die Wechselwirkungen zwischen Metriken und fehlende Komplexitätsmetriken behindern die Einschätzung und Vorhersage der Prozessmodellkomplexität und damit die -verständlichkeit. Schließlich kann unser kontextunabhängiger Ansatz ergänzend für die Integration von Datenqualität in spezifische Prozessmodellierungssprachen genutzt werden.AbstractThe importance of high data quality and the need to consider data quality in the context of business processes are well acknowledged. Process modeling is mandatory for process-driven data quality management, which seeks to improve and sustain data quality by redesigning processes that create or modify data. A variety of process modeling languages exist, which organizations heterogeneously apply. The purpose of this article is to present a context-independent approach to integrate data quality into the variety of existing process models. The authors aim to improve communication of data quality issues across stakeholders while considering process model complexity. They build on a keyword-based literature review in 74 IS journals and three conferences, reviewing 1,555 articles from 1995 onwards. 26 articles, including 46 process models, were examined in detail. The literature review reveals the need for a context-independent and visible integration of data quality into process models. First, the authors derive the within-model integration, that is, enhancement of existing process models with data quality characteristics. Second, they derive the across-model integration, that is, integration of a data-quality-centric process model with existing process models. Since process models are mainly used for communicating processes, they consider the impact of integrating data quality and the application of patterns for complexity reduction on the models’ complexity metrics. There is need for further research on complexity metrics to improve applicability of complexity reduction patterns. Missing knowledge about interdependency between metrics and missing complexity metrics impede assessment and prediction of process model complexity and thus understandability. Finally, our context-independent approach can be used complementarily to data quality integration focusing on specific process modeling languages.
- TextdokumentReal or Fake? Large-Scale Validation of Identity Leaks(INFORMATIK 2017, 2017) Maschler, Fabian; Niephaus, Fabio; Risch, JulianOn the Internet, criminal hackers frequently leak identity data on a massive scale. Subsequent criminal activities, such as identity theft and misuse, put Internet users at risk. Leak checker services enable users to check whether their personal data has been made public. However, automatic crawling and identification of leak data is error-prone for different reasons. Based on a dataset of more than 180 million leaked identity records, we propose a software system that identifies and validates identity leaks to improve leak checker services. Furthermore, we present a proficient assessment of leak data quality and typical characteristics that distinguish valid and invalid leaks.
- ZeitschriftenartikelTowards a Conceptualization of Data and Information Quality in Social Information Systems(Business & Information Systems Engineering: Vol. 59, No. 1, 2017) Tilly, Roman; Posegga, Oliver; Fischbach, Kai; Schoder, DetlefData and information quality (DIQ) have been defined traditionally in an organizational context and with respect to traditional information systems (IS). Numerous frameworks have been developed to operationalize traditional DIQ accordingly. However, over the last decade, social information systems (SocIS) such as social media have emerged that enable social interaction and open collaboration of voluntary prosumers, rather than supporting specific tasks as do traditional IS in organizations. Based on a systematic literature review, the paper identifies and categorizes prevalent DIQ conceptualizations. The authors differentiate the various understandings of DIQ in light of the unique characteristics of SocIS and conclude that they do not capture DIQ in SocIS well, nor how it is defined, maintained, and improved through social interaction. The paper proposes a new conceptualization of DIQ in SocIS that can explain the interplay of existing conceptualizations and provides the foundation for future research on DIQ in SocIS.