P214 - BTW2013 - Datenbanksysteme für Business, Technologie und Web
Auflistung P214 - BTW2013 - Datenbanksysteme für Business, Technologie und Web nach Titel
1 - 10 von 42
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragApplying stratosphere for big data analytics(Datenbanksysteme für Business, Technologie und Web (BTW) 2046, 2013) Leich, Marcus; Adamek, Jochen; Schubotz, Moritz; Heise, Arvid; Rheinländer, Astrid; Markl, VolkerAnalyzing big data sets as they occur in modern business and science applications requires query languages that allow for the specification of complex data processing tasks. Moreover, these ideally declarative query specifications have to be optimized, parallelized and scheduled for processing on massively parallel data processing platforms. This paper demonstrates the application of Stratosphere to different kinds of Big Data Analytics tasks. Using examples from different application domains, we show how to formulate analytical tasks as Meteor queries and execute them with Stratosphere. These examples include data cleansing and information extraction tasks, and a correlation analysis of microblogging and stock trade volume data that we describe in detail in this paper.
- KonferenzbeitragComposition methods for link discovery(Datenbanksysteme für Business, Technologie und Web (BTW) 2029, 2013) Hartung, Michael; Groß, Anika; Rahm, ErhardThe Linked Open Data community publishes an increasing number of data sources on the so-called Data Web and interlinks them to support data integration applications. We investigate how the composition of existing links and mappings can help discovering new links and mappings between LOD sources. Often there will be many alternatives for composition so that the problem arises which paths can provide the best linking results with the least computation effort. We therefore investigate different methods to select and combine the most suitable mapping paths. We also propose an approach for selecting and composing individual links instead of entire mappings. We comparatively evaluate the methods on several real-world linking problems from the LOD cloud. The results show the high value of reusing and composing existing links as well as the high effectiveness of our methods.
- Editiertes Buch
- KonferenzbeitragDatenmanagementpatterns in Simulationsworkflows(Datenbanksysteme für Business, Technologie und Web (BTW) 2030, 2013) Reimann, Peter; Schwarz, HolgerSimulationsworkflows müssen oftmals große Datenmengen verarbeiten, die in einer Vielzahl proprietärer Formate vorliegen. Damit diese Daten von den im Workflow eingebundenen Programmen und Diensten verarbeitet werden können, müssen sie in passende Formate transformiert werden. Dies erhöht die Komplexität der Workflowmodellierung, welche i.d.R. durch die Wissenschaftler selbst erfolgt. Dadurch können sich diese weniger auf den Kern der eigentlichen Simulation konzentrieren. Zur Behebung dieses Defizits schlagen wir einen Ansatz vor, mit dem die Aktivitäten zur Datenbereitstellung in Simulationsabläufen abstrakt modelliert werden können. Wissenschaftler sollen keine Implementierungsdetails, sondern lediglich die Kernaspekte der Datenbereitstellung in Form von Patterns beschreiben. Die Spezifikation der Patterns soll dabei möglichst in der Sprache der mathematischen Simulationsmodelle erfolgen, mit denen Wissenschaftler vertraut sind. Eine Erweiterung des Workflowsystems bildet die Patterns automatisch auf ausführbare Workflowfragmente ab, welche die Datenbereitstellung umsetzen. Dies alles reduziert die Komplexität der Modellierung von Simulationsworkflows und erhöht die Produktivität der Wissenschaftler.
- KonferenzbeitragDatensicherheit in mandantenfähigen Cloud Umgebungen(Datenbanksysteme für Business, Technologie und Web (BTW) 2041, 2013) Waizenegger, Tim; Schiller, Oliver; Mega, CataldoCloud Computing wird aktuell hauptsächlich für wissenschaftliches Rechnen und endkundenorientierte Dienste verwendet, da die Kostenersparnis hier ein besonders wichtiger Faktor ist. Die Betreiber von Cloud Plattformen sind jedoch immer stärker daran interessiert Cloud Dienste auch im Enterprise Segment anzubieten, um hier gleichermaßen von Kostenvorteilen zu profitieren. Die Kundenresonanz aus diesem Segment lässt jedoch zu wünschen übrig. Die Gründe dafür sind Bedenken bezüglich Datensicherheit und -vertraulichkeit in mandantenfähigen Systemen. Um diesem Problem zu begegnen, haben wir die Herausforderungen bei der Absicherung von mandantenfähigen Cloud Diensten untersucht, und den Umgang mit vertraulichem Schlüsselmaterial und Anmeldedaten als Schwachstelle identifiziert. Dieser Beitrag zeigt eine konzeptionelle Lösung zur zentralen Ablage und Zu- griffsverwaltung sensibler Daten, sowie deren prototypische Implementierung innerhalb der IBM Cloud Lösung SmartCloud Content Management.
- KonferenzbeitragDemonstrating near real-time analytics with IBM DB2 analytics accelerator(Datenbanksysteme für Business, Technologie und Web (BTW) 2042, 2013) Martin, Daniel; Ivanova, Iliyana; Mueller, Raphael; Velez Montoya, Luis Eduardo; Maruschka, KlausVersion 3 of the IBM1 DB2 Analytics Accelerator (IDAA) takes a major step towards the vision of a universal relational DBMS that transparently processes both, OLTP and analytical-type queries in a single system. Based on heuristics in DB2 for z/OS, the DB2 optimizer decides if a query should be executed by ”mainline” DB2 or if it is beneficial to forward it to the attached IBM DB2 Analytics Optimizer that operates on copies of the DB2 tables. The new ”incremental update” functionality keeps these copy tables in sync by employing replication technology that monitors the DB2 transaction log and asynchronously applies the changes in micro-batches to IDAA. This enables near real-time analytics over online data, effectively marrying traditionally separated OLTP and data warehouse environments. With IDAA, reports can access data that is constantly refreshed in contrast to traditional warehouses that are updated on a daily or even weekly basis. Without any changes to the applications and without the need to introduce cross-system ETL flows, an existing OLTP environment can be used for reporting purposes as well. In this demo, we present a near realtime reporting application modeled on an industry benchmark (TPC-DS), but with a constantly changing set of tables with over 800 million rows that is running on DB2 for z/OS. In a browser-based user interface, demo attendants can influence the rate of changes to the tables and observe how the reporting queries are capturing new data as it is being modified by a separately running OLTP workload generator.
- KonferenzbeitragDetecting plagiarism in text documents through grammar-analysis of authors(Datenbanksysteme für Business, Technologie und Web (BTW) 2028, 2013) Tschuggnall, Michael; Specht, GüntherThe task of intrinsic plagiarism detection is to find plagiarized sections within text documents without using a reference corpus. In this paper, the intrinsic detection approach Plag-Inn is presented which is based on the assumption that authors use a recognizable and distinguishable grammar to construct sentences. The main idea is to analyze the grammar of text documents and to find irregularities within the syntax of sentences, regardless of the usage of concrete words. If suspicious sentences are found by computing the pq-gram distance of grammar trees and by utilizing a Gaussian normal distribution, the algorithm tries to select and combine those sentences into potentially plagiarized sections. The parameters and thresholds needed by the algorithm are optimized by using genetic algorithms. Finally, the approach is evaluated against a large test corpus consisting of English documents, showing promising results.
- KonferenzbeitragDrillBeyond: open-world SQL queries using web tables(Datenbanksysteme für Business, Technologie und Web (BTW) 2050, 2013) Eberius, Julian; Thiele, Maik; Braunschweig, Katrin; Lehner, WolfgangThe Web consists of a huge number of documents, but also large amounts structured information, for example in the form of HTML tables containing relationalstyle data. One typical usage scenario for this kind of data is their integration into a database or data warehouse in order to apply data analytics. However, in today's business intelligence tools there is an evident lack of support for so-called situational or ad-hoc data integration. In this demonstration we will therefore present DrillBeyond, a novel database and information retrieval engine which allows users to query a local database as well as the web datasets in a seamless and integrated way with standard SQL. The audience will be able to pose queries to our DrillBeyond system which will be answered partly from local data in the database and partly from datasets that originate from the Web of Data. We will demonstrate the integration of the web tables back into the DBMS in order to apply its analytical features.
- KonferenzbeitragDuplicate detection on GPUs(Datenbanksysteme für Business, Technologie und Web (BTW) 2024, 2013) Forchhammer, Benedikt; Papenbrock, Thorsten; Stening, Thomas; Viehmeier, Sven; Draisbach, Uwe; Naumann, FelixWith the ever increasing volume of data and the ability to integrate different data sources, data quality problems abound. Duplicate detection, as an integral part of data cleansing, is essential in modern information systems. We present a complete duplicate detection workflow that utilizes the capabilities of modern graphics processing units (GPUs) to increase the efficiency of finding duplicates in very large datasets. Our solution covers several well-known algorithms for pair selection, attribute-wise similarity comparison, record-wise similarity aggregation, and clustering. We redesigned these algorithms to run memory-efficiently and in parallel on the GPU. Our experiments demonstrate that the GPU-based workflow is able to outperform a CPU-based implementation on large, real-world datasets. For instance, the GPU-based algorithm deduplicates a dataset with 1.8m entities 10 times faster than a common CPU-based algorithm using comparably priced hardware.
- KonferenzbeitragEvenPers: event-based person exploration and correlation(Datenbanksysteme für Business, Technologie und Web (BTW) 2049, 2013) Kapp, Christian; Strötgen, Jannik; Gertz, MichaelSearching for people on the Internet is one of the most frequent search activities. In this paper, we present EvenPers, a system for the event-based exploration of persons and person similarities. We address challenges such as cross-document person name normalization and present a novel approach to calculate person similarities based on their event information. In our demonstration, we show several exploration scenarios illustrating the usefulness of EvenPers and its exciting functionality.