Logo des Repositoriums

Datenbank Spektrum 17(3) - November 2017

Autor*innen mit den meisten Dokumenten  

Auflistung nach:

Neueste Veröffentlichungen

1 - 10 von 11
  • Zeitschriftenartikel
    Preserving Recomputability of Results from Big Data Transformation Workflows
    (Datenbank-Spektrum: Vol. 17, No. 3, 2017) Kricke, Matthias; Grimmer, Martin; Schmeißer, Michael
    The ability to recompute results from raw data at any time is important for data-driven companies to ensure data stability and to selectively incorporate new data into an already delivered data product. However, data transformation processes are heterogeneous and it is possible that manual work of domain experts is part of the process to create a deliverable data product. Domain experts and their work are expensive and time consuming, a recomputation process needs the ability of automatically adding former human interactions. It becomes even more challenging when external systems are used or data changes over time. In this paper, we propose a system architecture which ensures recomputability of results from big data transformation workflows on internal and external systems by using distributed key-value data stores. Furthermore, the system architecture will contain the possibility of incorporating human interactions of former data transformation processes. We will describe how our approach significantly relieves external systems and at the same time increases the performance of the big data transformation workflows.
  • Zeitschriftenartikel
    VAT: A Scientific Toolbox for Interactive Geodata Exploration
    (Datenbank-Spektrum: Vol. 17, No. 3, 2017) Beilschmidt, Christian; Drönner, Johannes; Mattig, Michael; Schmidt, Marco; Authmann, Christian; Niamir, Aidin; Hickler, Thomas; Seeger, Bernhard
    Data-driven research requires interactive systems supporting fast and intuitive data exploration. An important component is the user interface that facilitates this process. In biodiversity research, data is commonly of spatio-temporal nature. This poses unique opportunities for visual analytics approaches. In this paper we present the core concepts of the web-based front end of our vat (Visualization, Analysis and Transformation) system, a distributed geo-processing application. We present the results of two user studies and highlight unique features, among others for the management of time and the generalization of data.
  • Zeitschriftenartikel
    Data Lakes
    (Datenbank-Spektrum: Vol. 17, No. 3, 2017) Mathis, Christian
    By moving data into a centralized, scalable storage location inside an organization – the data lake – companies and other institutions aim to discover new information and to generate value from the data. The data lake can help to overcome organizational boundaries and system complexity. However, to generate value from the data, additional techniques, tools, and processes need to be established which help to overcome data integration and other challenges around this approach. Although there is a certain agreed-on notion of the central idea, there is no accepted definition what components or functionality a data lake has or how an architecture looks like. Throughout this article, we will start with the central idea and discuss various related aspects and technologies.
  • Zeitschriftenartikel
    The First Data Science Challenge at BTW 2017
    (Datenbank-Spektrum: Vol. 17, No. 3, 2017) Hirmer, Pascal; Waizenegger, Tim; Falazi, Ghareeb; Abdo, Majd; Volga, Yuliya; Askinadze, Alexander; Liebeck, Matthias; Conrad, Stefan; Hildebrandt, Tobias; Indiono, Conrad; Rinderle-Ma, Stefanie; Grimmer, Martin; Kricke, Matthias; Peukert, Eric
    The 17th Conference on Database Systems for Business, Technology, and Web (BTW2017) of the German Informatics Society (GI) took place in March 2017 at the University of Stuttgart in Germany. A Data Science Challenge was organized for the first time at a BTW conference by the University of Stuttgart and Sponsor IBM. We challenged the participants to solve a data analysis task within one month and present their results at the BTW. In this article, we give an overview of the organizational process surrounding the Challenge, and introduce the task that the participants had to solve. In the subsequent sections, the final four competitor groups describe their approaches and results.
  • Zeitschriftenartikel
    Einsatz eines Datenstrommanagementsystems als Framework für Online-Recommender-Systeme am Beispiel der Nachrichtenempfehlungen
    (Datenbank-Spektrum: Vol. 17, No. 3, 2017) Ludmann, Cornelius A.
    Im Rahmen der CLEF NewsREEL Challenge haben Teilnehmerinnen und Teilnehmer die Möglichkeit, Recommender-Systeme im Live-Betrieb für die Empfehlung von Nachrichtenartikeln zu evaluieren und sich mit anderen zu messen. Dazu werden sie durch Events über Impressions informiert und bekommen Requests, auf die sie mit Empfehlungen antworten müssen. Diese werden anschließend den Benutzerinnen und Benutzern angezeigt. Die Veranstalter messen, wie viele Empfehlungen tatsächlich angeklickt werden.Eine Herausforderung ist die zeitnahe Verarbeitung der Events, um in einem festgelegten Zeitraum mit Empfehlungen antworten zu können. In diesem Beitrag stellen wir unseren Ansatz auf Basis des Datenstrommanagementsystems »Odysseus« vor, mit dem wir durch kontinuierlich laufende Queries beliebte Nachrichtenartikel empfehlen. Mit diesem konnten wir uns im Rahmen der CLEF NewsREEL Challenge 2016 gegenüber den anderen Teilnehmern behaupten und die meisten Klicks auf unsere Empfehlungen erzielen.
  • Zeitschriftenartikel
    (Datenbank-Spektrum: Vol. 17, No. 3, 2017) Härder, Theo
  • Zeitschriftenartikel
    (Datenbank-Spektrum: Vol. 17, No. 3, 2017) Härder, Theo
  • Zeitschriftenartikel
    SQL/JSON Standard: Properties and Deficiencies
    (Datenbank-Spektrum: Vol. 17, No. 3, 2017) Petković, Dušan
    Recently, a new era of application development is emerging, which is based upon the ease of access to modern compute resources, such as mobile devices. This access can be supported using JSON (Java Script Object Notation). Therefore, the support of storage and query access for JSON documents in the context of relational DBMSs is necessary. For this reason, the SQL standardization committee published a proposal called SQL/JSON. In this paper we discuss the JSON features specified in the proposal and show to what extent different relational database systems have integrated them.At the end of the paper we describe the main drawbacks of the proposal and the ways to solve them. From our point of view, the following should be specified in one of the future proposals of SQL/JSON: JSON documents should be first-class objects in SQL (native storage). Handling JSON documents as first-class objects in SQL would provide the potential for greater capability for users and for better performance. The support of modification of parts of a JSON document using the SQL UPDATE statement is necessary. Direct access of external JSON data should be supported, too.
  • Zeitschriftenartikel
    Ranking Specific Sets of Objects
    (Datenbank-Spektrum: Vol. 17, No. 3, 2017) Maly, Jan; Woltran, Stefan
    Ranking sets of objects based on an order between the single elements has been thoroughly studied in the literature. In particular, it has been shown that it is in general impossible to find a total ranking – jointly satisfying properties as dominance and independence – on the whole power set of objects. However, in many applications certain elements from the entire power set might not be required and can be neglected in the ranking process. For instance, certain sets might be ruled out due to hard constraints or are not satisfying some background theory. In this paper, we treat the computational problem whether an order on a given subset of the power set of elements satisfying different variants of dominance and independence can be found, given a ranking on the elements. We show that this problem is tractable for partial rankings and NP-complete for total rankings.
  • Zeitschriftenartikel
    Energy Efficiency in Main-Memory Databases
    (Datenbank-Spektrum: Vol. 17, No. 3, 2017) Noll, Stefan; Funke, Henning; Teubner, Jens
    As the operating costs of today’s data centres continue to increase and processor manufacturers are forced to meet thermal design power constraints when designing new hardware, the energy efficiency of a main-memory database management system becomes more and more important. Plus, lots of database workloads are more memory-intensive than compute-intensive, which results in computing power being unused and wasted. This can become a problem because wasting computing also means wasting electrical power.In this paper, we experimentally study the impact of reducing the clock frequency of the processor and the impact of using fewer processor cores on the energy efficiency of common database algorithms such as scans, simple aggregations, simple hash joins, and state-of-the-art join algorithms. We stress the fundamental trade-off between peak performance and energy efficiency, as opposed to the established race-to-idle strategy. Ultimately, we show that reducing unused computing power significantly improves the energy efficiency of memory-bound database algorithms.