P180 - BTW2011 - Datenbanksysteme für Business, Technologie und Web
Auflistung P180 - BTW2011 - Datenbanksysteme für Business, Technologie und Web nach Erscheinungsdatum
1 - 10 von 57
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragBenchmarking hybrid OLTP&OLAP database systems(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Funke, Florian; Kemper, Alfons; Neumann, ThomasRecently, the case has been made for operational or real-time Business Intelligence (BI). As the traditional separation into OLTP database and OLAP data warehouse obviously incurs severe latency disadvantages for operational BI, hybrid OLTP&OLAP database systems are being developed. The advent of the first generation of such hybrid OLTP&OLAP database systems requires means to characterize their performance. While there are standardized and widely used benchmarks addressing either OLTP or OLAP workloads, the lack of a hybrid benchmark led us to the definition of a new mixed workload benchmark, called TPC-CH. This benchmark bridges the gap between the existing single-workload suits: TPC-C for OLTP and TPC-H for OLAP. The newly proposed TPC-CH benchmark executes a mixed workload: A transactional workload based on the order entry processing of TPC-C and a corresponding TPC-H-equivalent OLAP query suite on this sales data base. As it is derived from these two most widely used TPC benchmarks our new TPC-CH benchmark produces results that are highly comparable to both, hybrid systems and classic single-workload systems. Thus, we are able to compare the performance of our own (and other) hybrid database system running both OLTP and OLAP workloads in parallel with the OLTP performance of dedicated transactional systems (e.g., VoltDB) and the OLAP performance of specialized OLAP databases (e.g., column stores such as MonetDB).
- KonferenzbeitragThe power of declarative languages: from information extraction to machine learning(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Vaithyanathan, ShivakumarAs advanced analytics has become more mainstream in enterprises, usability and system-managed performance optimizations are critical for its wide adoption. As a result, there is an active interest in the design of declarative languages in several analytics areas. In this talk I will describe the efforts in IBM around three areas namely Information Extraction, Entity Resolution and Machine Learning. I will detail these efforts, at some length, and also explain the motivation behind some of the design choices made while implementing declarative solutions for the individual areas. I will end with results that demonstrate multiple advantages of the declarative approaches as compared with existing solutions.
- KonferenzbeitragView maintenance using partial deltas(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Jörg, Thomas; Dessloch, StefanThis paper addresses maintenance of materialized views in a warehousing environment, where views reside on a remote database. We analyze so called Change Data Capture techniques used to capture changes (also referred to as deltas) at the source systems. We show that many existing CDC techniques do not provide complete deltas but rather incomplete (or partial) deltas. Traditional view maintenance techniques, however, require complete deltas as input. We propose a generalized technique that allows for maintaining a class of materialized views using partial deltas.
- KonferenzbeitragInteractive predictive analytics with columnar databases(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Oberhofer, Martin; Wurst, MichaelPredictive Analytics is usually seen as highly interactive task. Paradoxically, it is still performed mostly as a batch task. This does not only limit its applicability, it also sets it apart from a task that is conceptually very close to it, namely OLAP analysis. The main reason for considering mining a batch task is the usually very high execution time on large data warehouses. While novel hardware offers the ability of highly distributed execution of predictive analytics algorithms, this level of parallelism cannot be exploited within the traditional row-based database paradigm. Columnar databases offer a solution to this problem, as the underlying datastructures lend themselves very well to parallel execution. This reduces the repsonse time for mining queries several magnitudes for some algorithms. While making mining faster and more responsive is already nice in itself, the real value of low response times is allowing completely new ways of interacting with huge data warehouses. In this arcticle we give a survey on the opportunities and challanges of interative, OLAP-like mining and on how columnar databases can support it. We exemplify these ideas on a task that is especially attractive for interactive mining, namely outlier detection in large data warehouses.
- KonferenzbeitragResolving temporal conflicts in inconsistent RDF knowledge bases(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Dylla, Maximilian; Sozio, Mauro; Theobald, MartinRecent trends in information extraction have allowed us to not only extract large semantic knowledge bases from structured or loosely structured Web sources, but to also extract additional annotations along with the RDF facts these knowledge bases contain. Among the most important types of annotations are spatial and temporal annotations. In particular the latter temporal annotations help us to reflect that a majority of facts is not static but highly ephemeral in the real world, i.e., facts are valid for only a limited amount of time, or multiple facts stand in temporal dependencies with each other. In this paper, we present a declarative reasoning framework to express and process temporal consistency constraints and queries via first-order logical predicates. We define a subclass of first-order constraints with temporal predicates for which the knowledge base is guaranteed to be satisfiable. Moreover, we devise efficient grounding and approximation algorithms for this class of first order constraints, which can be solved within our framework. Specifically, we reduce the problem of finding a consistent subset of time-annotated facts to a scheduling problem and give an approximation algorithm for it. Experiments over a large temporal knowledge base (T-YAGO) demonstrate the scalability and excellent approximation performance of our framework.
- KonferenzbeitragSSD ≠ SSD – an empirical study to identify common properties and type-specific behavior(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Hudlet, Volker; Schall, DanielSolid-state disks are promising high access speed at low energy consumption. While the basic technology for SSDs – flash memory – is well established, new product models are constantly emerging. With each new SSD generation, their behavior pattern changes significantly and it is therefore difficult to make out characteristics for SSDs in general. In this paper, we accomplish empirical, database-centric performance measurements for SSDs, explain the results, and try to derive common characteristics. By comparing our measurement results, we detect no ground truth valid for all solid-state disks. Furthermore, we show that a number of prevalent assumptions about SSDs, which several SSD-specific DBMS optimizations are based on, are questionable by now. As a consequence of these findings, tailor-made DBMS algorithms for specific SSD types may be unsuitable and optimal use of SSD technology in an DBMS context may require careful design and rather adaptive algorithms.
- KonferenzbeitragTechnical introduction to the IBM smart analytics optimizer for DB2 for system z(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Hrle, Namik; Draese, Oliver
- KonferenzbeitragQSQLp: Eine Erweiterung der probabilistischen Many-World- Semantik um Relevanzwahrscheinlichkeiten(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Lehrack, Sebastian; Saretz, Sascha; Schmitt, IngoDie traditionelle Auswertung einer Datenbankanfrage ermittelt für jedes Tupel entweder den Wahrheitswert Wahr oder Falsch. Für viele Anwendungsszenarien ist diese Auswertungssemantik zu restriktiv, insbesondere wenn ein differenzierteres Anfrageergebnis benö- tigt wird. Ein etablierter probabilistischer Ansatz zum Erreichen dieser Ausdifferenzierung ist die Verwendung sogenannter Relevanzwahrscheinlichkeiten: Mit welcher Wahrscheinlichkeit ist ein Dokument oder ein Datenobjekt bezüglich einer gestellten Anfrage relevant? Neben den IR-motivierten Relevanzwahrscheinlichkeiten hat sich in der Datenbankforschung das Gebiet der probabilistischen Datenbanken etabliert. Auch hier wird ein striktes, deterministisches Auswertungsmodell als nicht mehr ausreichend angesehen. In probabilistischen Datenbanksystemen werden daher mehrere mögliche Zustände für ein und dasselbe System in einer gemeinsamen Datenbank verwaltet. Die vorliegende Arbeit verbindet diese beiden probabilistischen Ansätze zu einem semantisch reicheren Anfragemodell.
- KonferenzbeitragPanel: “One size fits all”: an idea whose time has come and gone?(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Dittrich, Jens; Färber, Franz; Graefe, Goetz; Loeser, Henrik; Reimann, Wilfried
- KonferenzbeitragMapReduce and PACT - comparing data parallel programming models(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Alexandrov, Alexander; Ewen, Stephan; Heimel, Max; Hueske, Fabian; Kao, Odej; Markl, Volker; Nijkamp, Erik; Warneke, DanielWeb-Scale Analytical Processing is a much investigated topic in current research. Next to parallel databases, new flavors of parallel data processors have recently emerged. One of the most discussed approaches is MapReduce. MapReduce is highlighted by its programming model: All programs expressed as the second-order functions map and reduce can be automatically parallelized. Although MapReduce provides a valuable abstraction for parallel programming, it clearly has some deficiencies. These become obvious when considering the tricks one has to play to express more complex tasks in MapReduce, such as operations with multiple inputs. The Nephele/PACT system uses a programming model that pushes the idea of MapReduce further. It is centered around so called Parallelization Contracts (PACTs), which are in many cases better suited to express complex operations than plain MapReduce. By the virtue of that programming model, the system can also apply a series of optimizations on the data flows before they are executed by the Nephele runtime system. This paper compares the PACT programming model with MapReduce from the perspective of the programmer, who specifies analytical data processing tasks. We discuss the implementations of several typical analytical operations both with MapReduce and with PACTs, highlighting the key differences in using the two programming models.