P289 - BTW2019 - Datenbanksysteme für Business, Technologie und Web
Auflistung P289 - BTW2019 - Datenbanksysteme für Business, Technologie und Web nach Erscheinungsdatum
1 - 10 von 47
Treffer pro Seite
Sortieroptionen
- TextdokumentPerceptual Relational Attributes: Navigating and Discovering Shared Perspectives from User-Generated Reviews(BTW 2019, 2019) Lofi, Christoph; Valle Torre, Manuel; Ye, MengmengEffectively modelling and querying experience items like movies, books, or games in databases is challenging because these items are better described by their resulting user experience or perceived properties than by factual attributes. However, such information is often subjective, disputed, or unclear. Thus, social judgments like comments, reviews, discussions, or ratings have become a ubiquitous component of most Web applications dealing with such items, especially in the e-commerce domain. However, they usually do not play major role in the query process, and are typically just shown to the user. In this paper, we will discuss how to use unstructured user reviews to build a structured semantic representation of database items such that these perceptual attributes are (at least implicitly) represented and usable for navigational queries. Especially, we argue that a central challenge when extracting perceptual attributes from social judgments is respecting the subjectivity of expressed opinions. We claim that no representation consisting of only a single tuple will be sufficient. Instead, such systems should aim at discovering shared perspectives, representing dominant perceptions and opinions, and exploiting those perspectives for query processing.
- TextdokumentTwoogle: Searching Twitter With MongoDB Queries(BTW 2019, 2019) Wingerath, Wolfram; Gessert, Felix; Ritter, NorbertModern real-time databases follow the same collection-based querying semantics as traditional database systems. Targeting interactive workloads, real-time databases do not only deliver a query’s result upon request, but also produce a continuous stream of informational updates thereafter. In theory, building interactive, reactive, or collaborative applications should thus be simple with collection-based real-time queries as they bridge the gap between traditional database queries over static collections and continuous queries over dynamic data streams. In practice, though, building real-time applications is still considered challenging, since most real-time databases today provide only poor scalability, confusing interfaces for real-time data access, and reduced query expressiveness in comparison to their pull-based counterparts. In this demo, we illustrate that scalability, query expressiveness, and simplicity can go hand-in-hand for modern real-time databases. To this end, we present the social media search app Twoogle which is built on top of Baqend’s real-time query API.
- TextdokumentFast Approximated Nearest Neighbor Joins For Relational Database Systems(BTW 2019, 2019) Günther, Michael; Thiele, Maik; Lehner, WolfgangK nearest neighbor search (kNN-Search) is a universal data processing technique and a fundamental operation for word embeddings trained by word2vec or related approaches. The benefits of operations on dense vectors like word embeddings for analytical functionalities of RDBMSs motivate an integration of kNN-Joins. However, kNN-Search, as well as kNN-Joins, have barely been integrated into relational database systems so far. In this paper, we develop an index structure for approximated kNN-Joins working well on high-dimensional data and provide an integration into PostgreSQL. The novel index structure is efficient for different cardinalities of the involved join partners. An evaluation of the system based on applications on word embeddings shows the benefits of such an integrated kNN-Join operation and the performance of the proposed approach.
- TextdokumentGraph Data Transformations in Gradoop(BTW 2019, 2019) Kricke, Matthias; Peukert, Eric; Rahm, ErhardThe analysis of graph data using graph database and distributed graph processing systems has gained significant interest. However, relatively little effort has been devoted to preparing the graph data for analysis, in particular to transform and integrate data from different sources. To support such ETL processes for graph data we investigate transformation operations for property graphs managed by the distributed platform Gradoop. We also provide initial results of a runtime evaluation of the proposed graph data transformations.
- TextdokumentWhen is Harry Potters birthday? – Question Answering on Linked Data(BTW 2019, 2019) Steinmetz, Nadine; Arning, Ann-Katrin; Sattler, Kai-UweQuestion Answering (QA) systems provide a user-friendly way to access any type of knowledge base and especially for Linked Data sources to get insight into semantic information. But understanding natural language, transferring it to a formal query and finding the correct answer is a complex task. The challenge is even harder when the QA system aims to be easily adaptable regarding the underlying information. This goal can be achieved by an approach that is independent from the knowledge base. Thereby, the respective data can be replaced or updated without changes on the system itself. With this paper we present our QA approach and the demonstrator which is able to consume natural language questions of general knowledge (not specific to a topic or domain).
- TextdokumentIn-Database Machine Learning: Gradient Descent and Tensor Algebra for Main Memory Database Systems(BTW 2019, 2019) Schüle, Maximilian; Simonis, Frédéric; Heyenbrock, Thomas; Kemper, Alfons; Günnemann, Stephan; Neumann, ThomasMachine learning tasks such as regression, clustering, and classification are typically performed outside of database systems using dedicated tools, necessitating the extraction, transfor-mation, and loading of data. We argue that database systems when extended to enable automatic differentiation, gradient descent, and tensor algebra are capable of solving machine learning tasks more efficiently by eliminating the need for costly data communication. We demonstrate our claim by implementing tensor algebra and stochastic gradient descent using lambda expressions for loss functions as a pipelined operator in a main memory database system. Our approach enables common machine learning tasks to be performed faster than by extended disk-based database systems or as well as dedicated tools by eliminating the time needed for data extraction. This work aims to incorporate gradient descent and tensor data types into database systems, allowing them to handle a wider range of computational tasks.
- TextdokumentArchitectural Principles for Database Systems on Storage-Class Memory(BTW 2019, 2019) Oukid, IsmailStorage-Class Memory (SCM) is a novel class of memory technologies that combine the byte addressability and low latency of DRAM with the density and non-volatility of traditional storage media. Hence, SCM can serve as persistent main memory, i.e., as main memory and storage at the same time. In this thesis, we dissect the challenges and pursue the opportunities brought by SCM to database systems. To solve the identified challenges, we devise necessary building blocks for enabling SCM-based database systems, namely memory management, data structures, transaction concurrency control, recovery techniques, and a testing framework against new failure scenarios stemming from SCM. Thereafter, we leverage these building blocks to build SOFORT, a novel hybrid SCM-DRAM transactional storage engine that places data, accesses it, and updates it directly in SCM, thereby doing away with traditional write-ahead logging and achieving near-instant recovery.
- TextdokumentGanzheitliches Metadatenmanagement im Data Lake: Anforderungen, IT-Werkzeuge und Herausforderungen in der Praxis(BTW 2019, 2019) Gröger, Christoph; Hoos, EvaData Lakes haben sich in der industriellen Praxis als Plattformen für die Speicherung und Analyse aller Arten von (Roh-)daten etabliert. Erweiterte Anforderungen hinsichtlich Governance und Self-Service machen das Metadatenmanagement im Data Lake zum kritischen Erfolgsfaktor. Bisher gibt es dazu jedoch nur wenige wissenschaftliche Arbeiten, es mangelt insbesondere an einer ganzheitlichen Betrachtung zur Konzeption und Realisierung des Metadatenmanagements im Data Lake. Diese Arbeit adressiert das Thema und basiert auf praktischen Erfahrungen aus einem Industriekonzern beim Aufbau eines unternehmensweiten Data Lake. Es werden praktische Anforderungen und Anwendungsbeispiele für das Metadatenmanagement im Data Lake diskutiert und die unterschiedlichen Arten von Metadaten anhand des Praxisbeispiels analysiert. Zur Umsetzung des Metadatenmanagements werden anschießend unterschiedliche IT-Werkzeuge anhand definierter Kriterien analysiert. Das Analyseergebnis zeigt, dass Datenkataloge grundsätzlich die geeignete Werkzeugart darstellen, wobei noch technische Unzulänglichkeiten existieren. Abschließend werden die in der Praxis bestehenden Herausforderungen für ein ganzheitliches Metadatenmanagement im Data Lake zusammengefasst und zukünftige Forschungsbedarfe aufgezeigt.
- TextdokumentFrom LEGO to the Shopfloor: Driving Digitalization Through Process Technology(BTW 2019, 2019) Rinderle-Ma, StefanieProcesses constitute the major vehicle to support companies to move towards digital business (models). One major application area is smart manufacturing, also referred to as Industrie 4.0. Here, processes connect and integrate machines, actors, sensors, information systems, and business partners, collecting, processing, and exchanging production-relevant data. This yields manifold benefits such as optimized processing and integrated data analysis. Designing and implementing such solutions touches and raises many research challenges, including the flexible and robust implementation of distributed process networks and the application and extension of process-oriented data analysis methods. We illustrate these challenges by our own journey from a LEGO-based Industrie 4.0 lab setting to the centurio.work manufacturing orchestration suite.
- TextdokumentMachine Learning Applied to the Clerical Task Management Problem in Master Data Management Systems(BTW 2019, 2019) Oberhofer, Martin; Bremer, Lars; Chkalova, MariyaClerical tasks are created if a duplicate detection algorithm detects some similarity of records but not enough to allow an auto-merge operation. Data stewards review clerical tasks and make a final non-match or match decision. In this paper we evaluate different machine learning algorithms regarding their accuracy to predict the correct action for a clerical task and execute that action automatically if the prediction has sufficient confidence. This approach reduces the amount of work for data stewards by factors of magnitude.