P290 - BTW2019 - Datenbanksysteme für Business, Technologie und Web - Workshopband
Auflistung P290 - BTW2019 - Datenbanksysteme für Business, Technologie und Web - Workshopband nach Erscheinungsdatum
1 - 10 von 36
Treffer pro Seite
Sortieroptionen
- TextdokumentSoftware solutions for form-based, mobile data collection – A comparative evaluation(BTW 2019 – Workshopband, 2019) Steinberg, Markus; Schindler, Sirko; Klan, FriederikeMany citizen science projects rely on their contributors going to the field and collecting data. Due to their wide availability and increasing capability, modern mobile devices have become an indispensable tool to ease the collection process. Projects can publish mobile apps, that allow contributors to easily collect data and submit their results. The requirements of individual projects oftentimes overlap to a large extent, which triggered the development of multiple generic frameworks. They allow new projects to quickly generate customized apps and reuse existing infrastructure. However, the wide landscape of tools with diverging capabilities requires projects to compare and choose. This report supports data managers in making an informed decision. We report on our experiences primarily on the whole data collection workflow starting from setting up your own instance to finally analyzing the retrieved data. We compare eight tools – both free and commercial – according to the features provided and difficulties encountered.
- TextdokumentStaRAI or StaRDB?(BTW 2019 – Workshopband, 2019) Braun, TanyaThis tutorial aims at connecting databases and statical relational AI (StaRAI), demonstrating how database systems can benefit from methods developed within StaRAI, e.g., for implementing efficient systems combining databases and StaRAI. Thus, the goal of this tutorial is two-fold: (i) Present an overview of methods within StaRAI. (ii) Provide a forum to members of both communities for exchanging ideas.
- TextdokumentComputation Offloading in JVM-based Dataflow Engines(BTW 2019 – Workshopband, 2019) Gavriilidis, HaralamposState-of-the-art dataflow engines, such as Apache Spark and Apache Flink scale out on large clusters for a variety of data-processing tasks, including machine learning and data mining algorithms. However, being based on the JVM, they are unable to apply optimizations supported by modern CPUs. On the contrary, specialized data processing frameworks scale up by exploiting modern CPU characteristics. The goal of this thesis is to find the sweet spot between scale-out and scale-up systems by offloading computation from dataflow engines to specialized systems. We propose two computation offloading methods, reason about their applicability, and implement a prototype based on Apache Spark. Our evaluation shows that for compute-intensive tasks, computation offloading leads to performance improvements of up to a factor of 2.5x. For certain UDF scenarios, computation offloading performs worse by up to a factor of 3x: our microbenchmarks show that 80% of the time is spent on serialization operations. By employing data exchange without serialization, computation offloading achieves performance improvements by up to 10x.
- Textdokument1st Workshop on Novel Data Management Ideas on Heterogeneous (Co-)Processors (NoDMC)(BTW 2019 – Workshopband, 2019) Broneske, David; Habich, Dirk
- TextdokumentQuality Indicators for Text Data(BTW 2019 – Workshopband, 2019) Kiefer, CorneliaTextual data sets vary in terms of quality. They have different characteristics such as the average sentence length or the amount of spelling mistakes and abbreviations. These text characteristics have influence on the quality of text mining results. They may be measured automatically by means of quality indicators. We present indicators, which we implemented based on natural language processing libraries such as Stanford CoreNLP2 and NLTK3. We discuss design decisions in the implementation of exemplary indicators and provide all indicators on GitHub4. In the evaluation, we investigate free texts from production, news, prose, tweets and chat data and show that the suggested indicators predict the quality of two text mining modules.
- TextdokumentChain-detection for DBSCAN(BTW 2019 – Workshopband, 2019) Held, Janis; Beer, Anna; Seidl, ThomasChains connecting two or more different clusters are a well known problem of the probably most famous density-based clustering algorithm DBSCAN. Since already a small number of points resulting from, e.g., noise can form such a chain and build a bridge between different clusters, it can happen that the results of DBSCAN are distorted: several disparate clusters get merged into one. This single-link effect is rather known but to the best of our knowledge there are no satisfying solutions which extract those chains, yet. We present a new algorithm detecting not only straight chains between clusters, but also bent and noisy ones. Users are able to choose between eliminating one dimensional and higher dimensional chains connecting clusters to receive the underlying cluster structure by DBSCAN. Also, the desired straightness can be set by the user. We tested our efficient algorithm on a dataset containing traffic accidents in Great Britain and were able to detect chains emerging from streets between cities and villages, which led to clusters composed of diverse villages.
- TextdokumentPeaks and the Influence of Weather, Traffic, and Events on Particulate Pollution(BTW 2019 – Workshopband, 2019) Hagedorn, Stefan; Sattler, Kai-UweThe task of the Data Science Challenge as part of the BTW 2019 conference is to analyze air quality data collected by the luftdaten2 project. This project provides sensor measurements recorded from volunteers around the world. With do-it-yourself setups people can deploy their own sensors and report various environmental values to the project’s servers, where they are made available as open data for further analyses. Thus, data is available only in regions where volunteers decided to participate in the project. Since in our city, Ilmenau, as well as in the state Thuringia only very few sensors are present, we decided to shift our focus to a broader area around Thuringia.
- TextdokumentDPI: The Data Processing Interface for Modern Networks (Extended Abstract)(BTW 2019 – Workshopband, 2019) Binnig, Carsten
- TextdokumentReProVide: Towards Utilizing Heterogeneous Partially Reconfigurable Architectures for Near-Memory Data Processing(BTW 2019 – Workshopband, 2019) Becher, Andreas; Herrmann, Achim; Wildermann, Stefan; Teich, JürgenReconfigurable hardware such as Field-programmable Gate Arrays (FPGAs) is widely used for data processing in databases. Most of the related work focuses on accelerating one or a small set of specific operations like sort, join, regular expression matching. A drawback of such approaches is often the assumed static accelerator hardware architecture: Rather than adapting the hardware to fit the query, the query plan has to be adapted to fit the hardware. Moreover, operators or data types that are not supported by the accelerator have to be processed in software. As a remedy, approaches for exploiting the dynamic partial reconfigurability of FPGAs have been proposed that are able to adapt the datapath at runtime. However, on modern FPGAs, this introduces new challenges due to the heterogeneity of the available resources. In addition, not only the execution resources may be heterogeneous but also the memory resources. This work focuses on the architectural aspects of database (co-)processing on heterogeneous FPGA-based PSoC (programmable System-on-Chip) architectures including processors, specialized hardware components, multiple memory types and dynamically partially reconfigurable areas. We present an approach to support such (co-)processing called ReProVide. In particular, we introduce a model to formalize the challenging task of operator placement and buffer allocation onto such heterogeneous hardware and describe the difficulties of finding good placements. Furthermore, a detailed insight into different memory types and their peculiarities is given in order to use the strength of heterogeneous memory architectures. Here, we also highlight the implications of heterogeneous memories for the problem of query placement.
- TextdokumentContext Selection in a Heterogeneous Legal Ontology(BTW 2019 – Workshopband, 2019) Wehnert, Sabine; Fenske, Wolfram; Saake, GunterOntology building in the legal domain is subject to ongoing research. Taxonomic ontologies provide for instance concept hierarchies for term definitions, annotations, query expansion and support for inferences. However, the context-dependent application of statuatory legal texts is hard to model, often leading to a limited ontology scope and fixed terminology to avoid conflicts. In previous work, we presented a method to create a lightweight heterogeneous ontology from textbooks offering connections between laws, while avoiding an error-prone and costly ontology alignment step. In our ontology, laws are linked by common contexts. We propose a new data model, so that the context can be explored and selected by a user, which is necessary for many applications, such as recommender systems. To obtain the relevant user context, we added a mechanism to retrieve linked laws from our ontology, given a scope of user interest and context information for each law.