Auflistung nach Schlagwort "Data science"
1 - 4 von 4
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelIntelligent User Assistance for Automated Data Mining Method Selection(Business & Information Systems Engineering: Vol. 62, No. 3, 2020) Zschech, Patrick; Horn, Richard; Höschele, Daniel; Janiesch, Christian; Heinrich, KaiIn any data science and analytics project, the task of mapping a domain-specific problem to an adequate set of data mining methods by experts of the field is a crucial step. However, these experts are not always available and data mining novices may be required to perform the task. While there are several research efforts for automated method selection as a means of support, only a few approaches consider the particularities of problems expressed in the natural and domain-specific language of the novice. The study proposes the design of an intelligent assistance system that takes problem descriptions articulated in natural language as an input and offers advice regarding the most suitable class of data mining methods. Following a design science research approach, the paper (i) outlines the problem setting with an exemplary scenario from industrial practice, (ii) derives design requirements, (iii) develops design principles and proposes design features, (iv) develops and implements the IT artifact using several methods such as embeddings, keyword extractions, topic models, and text classifiers, (v) demonstrates and evaluates the implemented prototype based on different classification pipelines, and (vi) discusses the results' practical and theoretical contributions. The best performing classification pipelines show high accuracies when applied to validation data and are capable of creating a suitable mapping that exceeds the performance of joint novice assessments and simpler means of text mining. The research provides a promising foundation for further enhancements, either as a stand-alone intelligent assistance system or as an add-on to already existing data science and analytics platforms.
- ZeitschriftenartikelSWEL: A Domain-Specific Language for Modeling Data-Intensive Workflows(Business & Information Systems Engineering: Vol. 66, No. 2, 2024) Salado-Cid, Rubén; Vallecillo, Antonio; Munir, Kamram; Romero, José RaúlData-intensive applications aim at discovering valuable knowledge from large amounts of data coming from real-world sources. Typically, workflow languages are used to specify these applications, and their associated engines enable the execution of the specifications. However, as these applications become commonplace, new challenges arise. Existing workflow languages are normally platform-specific, which severely hinders their interoperability with other languages and execution engines. This also limits their reusability outside the platforms for which they were originally defined. Following the Design Science Research methodology, the paper presents SWEL (Scientific Workflow Execution Language). SWEL is a domain-specific modeling language for the specification of data-intensive workflows that follow the model-driven engineering principles, covering the high-level definition of tasks, information sources, platform requirements, and mappings to the target technologies. SWEL is platform-independent, enables collaboration among data scientists across multiple domains and facilitates interoperability. The evaluation results show that SWEL is suitable enough to represent the concepts and mechanisms of commonly used data-intensive workflows. Moreover, SWEL facilitates the development of related technologies such as editors, tools for exchanging knowledge assets between workflow management systems, and tools for collaborative workflow development.
- ZeitschriftenartikelThe Collaborative Research Center FONDA(Datenbank-Spektrum: Vol. 21, No. 3, 2021) Leser, Ulf; Hilbrich, Marcus; Draxl, Claudia; Eisert, Peter; Grunske, Lars; Hostert, Patrick; Kainmüller, Dagmar; Kao, Odej; Kehr, Birte; Kehrer, Timo; Koch, Christoph; Markl, Volker; Meyerhenke, Henning; Rabl, Tilmann; Reinefeld, Alexander; Reinert, Knut; Ritter, Kerstin; Scheuermann, Björn; Schintke, Florian; Schweikardt, Nicole; Weidlich, MatthiasToday’s scientific data analysis very often requires complex Data Analysis Workflows (DAWs) executed over distributed computational infrastructures, e.g., clusters. Much research effort is devoted to the tuning and performance optimization of specific workflows for specific clusters. However, an arguably even more important problem for accelerating research is the reduction of development, adaptation, and maintenance times of DAWs. We describe the design and setup of the Collaborative Research Center (CRC) 1404 “FONDA -– Foundations of Workflows for Large-Scale Scientific Data Analysis”, in which roughly 50 researchers jointly investigate new technologies, algorithms, and models to increase the portability, adaptability, and dependability of DAWs executed over distributed infrastructures. We describe the motivation behind our project, explain its underlying core concepts, introduce FONDA’s internal structure, and sketch our vision for the future of workflow-based scientific data analysis. We also describe some lessons learned during the “making of” a CRC in Computer Science with strong interdisciplinary components, with the aim to foster similar endeavors.
- ZeitschriftenartikelThe First Data Science Challenge at BTW 2017(Datenbank-Spektrum: Vol. 17, No. 3, 2017) Hirmer, Pascal; Waizenegger, Tim; Falazi, Ghareeb; Abdo, Majd; Volga, Yuliya; Askinadze, Alexander; Liebeck, Matthias; Conrad, Stefan; Hildebrandt, Tobias; Indiono, Conrad; Rinderle-Ma, Stefanie; Grimmer, Martin; Kricke, Matthias; Peukert, EricThe 17th Conference on Database Systems for Business, Technology, and Web (BTW2017) of the German Informatics Society (GI) took place in March 2017 at the University of Stuttgart in Germany. A Data Science Challenge was organized for the first time at a BTW conference by the University of Stuttgart and Sponsor IBM. We challenged the participants to solve a data analysis task within one month and present their results at the BTW. In this article, we give an overview of the organizational process surrounding the Challenge, and introduce the task that the participants had to solve. In the subsequent sections, the final four competitor groups describe their approaches and results.