Auflistung nach Autor:in "Hagedorn, Stefan"
1 - 9 von 9
Treffer pro Seite
Sortieroptionen
- TextdokumentApplying Machine Learning Models to Scalable DataFrames with Grizzly(BTW 2021, 2021) Kläbe, Steffen; Hagedorn, StefanThe popular Python Pandas framework provides an easy-to-use DataFrame API that enables a broad range of users to analyze their data. However, Pandas faces severe scalability issues in terms of runtime and memory consumption, limiting the usability of the framework. In this paper we present Grizzly, a replacement for Python Pandas. Instead of bringing data to the operators like Pandas, Grizzly ships program complexity to database systems by transpiling the DataFrame API to SQL code. Additionally, Grizzly offers user-friendly support for combining different data sources, user-defined functions, and applying Machine Learning models directly inside the database system. Our evaluation shows that Grizzly significantly outperforms Pandas as well as state-of-the-art frameworks for distributed Python processing in several use cases.
- ZeitschriftenartikelComplex Event Processing on Linked Stream Data(Datenbank-Spektrum: Vol. 15, No. 2, 2015) Saleh, Omran; Hagedorn, Stefan; Sattler, Kai-UweSocial networks and Sensor Web technologies typically generate a massive amount of data published as streams. In order to give these streams a meaningful sense and enrich them with semantic descriptions, the concept of Linked Stream Data (LSD) has emerged. However, to support a wide range of LSD scenarios and queries comprehensive solutions providing not only classic data stream operators such as windows, but also for processing of complex events, linking of (static) datasets, and scalable processing are required. In this paper, we present our approach for processing LSD and addressing these requirements. In contrast to existing LSD engines relying on streaming extensions to SPARQL, our PipeFlow system is a (relational) dataflow language and engine providing support for complex event processing (CEP) and a few dedicated operators for RDF data. We describe this language and particularly the CEP model as well as the system architecture for parallel CEP and LSD processing by exploiting partitioning techniques for cluster environments. Finally, we report results from experiments evaluating our system in comparison to existing LSD engines.
- Konferenzbeitragd.fence - Sicherer Fernzugriff auf Agrarsteuerungen mit Hilfe eines SSH-Reverse-Proxys mit grafischer Bedien- und Verwaltungsoberfläche(38. GIL-Jahrestagung, Digitale Marktplätze und Plattformen, 2018) Hagedorn, StefanFür den Betreiber von Anlagen der Meß-, Steuer- und Regelungstechnik bietet sich mit dem System d.fence die Möglichkeit, sichere und robuste Fernzugänge als Standard für seine gesamte Produktpalette zu implementieren, wo er heute noch eine heterogene Landschaften vorfindet, welche maßgeblich von seinen Endkunden oder deren teils semiprofessionellen IT-Beauftragten mitgestaltet wird
- KonferenzbeitragGesture-based navigation in graph databases - the kevin bacon game(Datenbanksysteme für Business, Technologie und Web (BTW) 2047, 2013) Beier, Felix; Baumann, Stephan; Betz, Heiko; Hagedorn, Stefan; Wagner, TimoMotion sensing devices like Microsoft's Kinect offer an alternative to traditional computer input devices like keyboards and mouses. Graph databases can naturally make use of gesture control as traversing graphs can easily be described by swiping or pointing gestures. In our demo we traverse the Internet Movie Database (IMDB) using a Kinect interface with the control logic in our data stream engine AnduIN. The gesture detection is done based on AnduIN's complex event processing functionality.
- KonferenzbeitragJPTest - Grading Data Science Exercises in Jupyter Made Short, Fast and Scalable(BTW 2023, 2023) Tröbs, Eric; Hagedorn, Stefan; Sattler, Kai-UweJupyter Notebook is not only a popular tool for publishing data science results, but canalso be used for the interactive explanation of teaching content as well as the supervised work onexercises. In order to give students feedback on their solutions, it is necessary to check and evaluatethe submitted work. To exploit the possibilities of remote learning as well as to reduce the workneeded to evaluate submissions, we present a flexible and efficient framework. It enables automatedchecking of notebooks for completeness and syntactic correctness as well as fine-grained evaluationof submitted tasks. The framework comes with a high level of parallelization, isolation and a shortand efficient API.
- TextdokumentPeaks and the Influence of Weather, Traffic, and Events on Particulate Pollution(BTW 2019 – Workshopband, 2019) Hagedorn, Stefan; Sattler, Kai-UweThe task of the Data Science Challenge as part of the BTW 2019 conference is to analyze air quality data collected by the luftdaten2 project. This project provides sensor measurements recorded from volunteers around the world. With do-it-yourself setups people can deploy their own sensors and report various environmental values to the project’s servers, where they are made available as open data for further analyses. Thus, data is available only in regions where volunteers decided to participate in the project. Since in our city, Ilmenau, as well as in the state Thuringia only very few sensors are present, we decided to shift our focus to a broader area around Thuringia.
- TextdokumentProcessing Large Raster and Vector Data in Apache Spark(BTW 2019, 2019) Hagedorn, Stefan; Birli, Oliver; Sattler, Kai-UweSpatial data processing frameworks in many cases are limited to vector data only. However, an important type of spatial data is raster data which is produced by sensors on satellites but also by high resolution cameras taking pictures of nano structures, such as chips on wafers. Often the raster data sets become large and need to be processed in parallel on a cluster environment. In this paper we demonstrate our STARK framework with its support for raster data and functionality to combine raster and vector data in filter and join operations. To save engineers from the burden of learning a programming language, queries can be formulated in SQL in a web interface. In the demonstration, users can use this web interface to inspect examples of raster data using our extended SQL queries on a Apache Spark cluster.
- KonferenzbeitragSparqling pig - processing linked data with pig Latin(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Hagedorn, Stefan; Hose, Katja; Sattler, Kai-UweIn recent years, dataflow languages such as Pig Latin have emerged as flexible and powerful tools for handling complex analysis tasks on big data. These languages support schema flexibility as well as common programming patterns such as iteration. They offer extensibility through user-defined functions while running on top of scalable distributed platforms. In doing so, these languages enable analytical tasks while avoiding the limitations of classical query languages such as SQL and SPARQL. However, the tuple-oriented view of general-purpose languages like Pig does not match very well the specifics of modern datasets available on the Web, which often use the RDF data model. Graph patterns, for instance, are one of the core concepts of SPARQL but have to be formulated as explicit joins, which burdens the user with the details of efficient query processing strategies. In this paper, we address this problem by proposing extensions to Pig that deal with linked data in RDF to bridge the gap between Pig and SPARQL for analytics. These extensions are realized by a set of user-defined functions and rewriting rules, still allowing to compile the enhanced Pig scripts to plain MapReduce programs. For all proposed extensions, we discuss possible rewriting strategies and present results from an experimental evaluation.
- KonferenzbeitragThe STARK Framework for Spatio-Temporal Data Analytics on Spark(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Hagedorn, Stefan; Götze, Philipp; Sattler, Kai-UweBig Data sets can contain all types of information: from server log files to tracking information of mobile users with their location at a point in time. Apache Spark has been widely accepted for Big Data analytics because of its very fast processing model. However, Spark has no native support for spatial or spatio-temporal data. Spatial filters or joins using, e.g., a contains predicate are not supported and would have to be implemented ine ciently by the users. Also, Spark cannot make use of, e.g., spatial distribution for optimal partitioning. Here we present our STARK framework that adds spatio-temporal support to Spark. It includes spatial partitioners, different modes for indexing, as well as filter, join, and clustering operators. In contrast to existing solutions, STARK integrates seamlessly into any (Scala) Spark program and provides more flexible and comprehensive operators. Furthermore, our experimental evaluation shows that our implementation outperforms existing solutions.