P289 - BTW2019 - Datenbanksysteme für Business, Technologie und Web
Auflistung P289 - BTW2019 - Datenbanksysteme für Business, Technologie und Web nach Titel
1 - 10 von 47
Treffer pro Seite
Sortieroptionen
- TextdokumentArchitectural Principles for Database Systems on Storage-Class Memory(BTW 2019, 2019) Oukid, IsmailStorage-Class Memory (SCM) is a novel class of memory technologies that combine the byte addressability and low latency of DRAM with the density and non-volatility of traditional storage media. Hence, SCM can serve as persistent main memory, i.e., as main memory and storage at the same time. In this thesis, we dissect the challenges and pursue the opportunities brought by SCM to database systems. To solve the identified challenges, we devise necessary building blocks for enabling SCM-based database systems, namely memory management, data structures, transaction concurrency control, recovery techniques, and a testing framework against new failure scenarios stemming from SCM. Thereafter, we leverage these building blocks to build SOFORT, a novel hybrid SCM-DRAM transactional storage engine that places data, accesses it, and updates it directly in SCM, thereby doing away with traditional write-ahead logging and achieving near-instant recovery.
- TextdokumentThe Best of Both Worlds: Combining Hand-Tuned and Word-Embedding-Based Similarity Measures for Entity Resolution(BTW 2019, 2019) Chen, Xiao; Campero Durand, Gabriel; Zoun, Roman; Broneske, David; Li, Yang; Saake, GunterRecently word embedding has become a beneficial technique for diverse natural language processing tasks, especially after the successful introduction of several popular neural word embedding models, such as word2vec, GloVe, and FastText. Also entity resolution, i.e., the task of identifying digital records that refer to the same real-world entity, has been shown to benefit from word embedding. However, the use of word embeddings does not lead to a one-size-fits-all solution, because it cannot provide an accurate result for those values without any semantic meaning, such as numerical values. In this paper, we propose to use the combination of general word embedding with traditional hand-picked similarity measures for solving ER tasks, which aims to select the most suitable similarity measure for each attribute based on its property. We provide some guidelines on how to choose suitable similarity measures for different types of attributes and evaluate our proposed hybrid method on both synthetic and real datasets. Experiments show that a hybrid method reliant on correctly selecting required similarity measures can outperform the method of purely adopting traditional or word-embedding-based similarity measures.
- TextdokumentBig graph analysis by visually created workflows(BTW 2019, 2019) Rostami, M. Ali; Peukert, Eric; Wilke, Moritz; Rahm, ErhardThe analysis of large graphs has received considerable attention recently but current solutions are typically hard to use. In this demonstration paper, we report on an effort to improve the usability of the open-source system Gradoop for processing and analyzing large graphs. This is achieved by integrating Gradoop into the popular open-source software KNIME to visually create graph analysis workflows, without the need for coding. We outline the integration approach and discuss what will be demonstrated.
- TextdokumentBlockchain in the Context of Business Applications and Enterprise Databases(BTW 2019, 2019) Renkes, Frank; Sommer, ChristianBlockchain seems to be the future of all cross-company business applications. Similar to the adoption of machine learning into all novel and existing business applications and processes we can see the same trend for blockchain. Nearly every application tries to leverage blockchain technology to improve the application related process chains. Is this just a hype or is blockchain really the solution to all problems, in which applications rely on an intelligent and secure data distribution / sharing? What are the most relevant qualities of blockchain needed in modern business applications and which role can a traditional database play in this? Wouldn’t be an integration of some of the qualities into traditional databases a better approach to build the so called ‘distributed business applications’? What is the relationship and overlap between core blockchain and core database concepts like (redo) logging, security features like auditing and encryption, distributed (query) processing, as well as stored procedures/smart contracts? This talk discusses how blockchain can be integrated into existing business applications and processes, what the biggest challenges are and which role a traditional database can play in this context.
- TextdokumentThe Borda Social Choice Movie Recommender(BTW 2019, 2019) Kastner, Johannes; Ranitovic, Nemanja; Endres, MarkusIn this demo paper we present a recommender system, which exploits the Borda social choice voting rule for clustering recommendations in order to produce comprehensible results for a user. Considering existing clustering techniques like k-means, the overhead of normalizing and preparing the preferred user data is dropped. In our demo showcase we facilitate a comparison of our clustering approach to the well known k-means++ with traditional distance measures.
- TextdokumentBTW2019 - Datenbanksysteme für Business, Technologie und Web(BTW 2019, 2019) Grust, Torsten; Naumann, Felix; Böhm, Alexander; Lehner, Wolfgang; Härder, Theo; Rahm, Erhard; Heuer, Andreas; Klettke, Meike; Meyer, Holger
- TextdokumentBuilding Scalable Machine Learning Solutions for Data Cleaning(BTW 2019, 2019) Ilyas, IhabMachine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. In this talk I discuss why leveraging data semantics and domain-specific knowledge is key in delivering the optimizations necessary for truly scalable ML curation solutions. The talk focuses on two main problems: (1) entity consolidation, which is arguably the most difficult data curation challenge because it is notoriously complex and hard to scale; and (2) using probabilistic inference to suggest data repair for identified errors and anomalies using our new system called HoloClean. Both problems have been challenging researchers and practitioners for decades due to the fundamentally combinatorial explosion in the space of solutions and the lack of ground truth. There’s a large body of work on this problem by both academia and industry. Techniques have included human curation, rules-based systems, and automatic discovery of clusters using predefined thresholds on record similarity Unfortunately, none of these techniques alone has been able to provide sufficient accuracy and scalability. The talk aims at providing deeper insight into the entity consolidation and data repair problems and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution.
- TextdokumentData Profiling – Effiziente Entdeckung Struktureller Abhängigkeiten(BTW 2019, 2019) Papenbrock, ThorstenDaten sind nicht nur in der Informatik, sondern auch in vielen anderen wissenschaftlichen Disziplinen ein unverzichtbares Wirtschaftsgut. Sie dienen dem Austausch, der Verknüpfung und der Speicherung von Wissen und sind daher unverzichtbar in Forschung und Wirtschaft. Leider sind Daten häufig nicht ausreichend dokumentiert um sie direkt nutzen zu können – es fehlen Metadaten, welche die Struktur und damit Zugriffsmuster der digitalen Informationen beschreiben. Informatiker und Experten anderer Disziplinen verbringen daher viel Zeit damit, Daten strukturell zu analysieren und aufzubereiten. Da die Suche nach Metadaten jedoch eine hoch komplexe Aufgabe ist, scheitern viele algorithmische Ansätze schon an kleinen Datenmengen. In der Dissertation, die dieser Zusammenfassung zugrunde liegt, stellen wir drei neuartige Ent-deckungsalgorithmen für wichtige und zugleich schwierig zu findende Typen von Metadaten vor: Eindeutige Spaltenkombinationen, funktionale Abhängigkeiten und Inklusionsabhängigkeiten. Die vorgeschlagenen Algorithmen übertreffen deutlich den bisherigen Stand der Technik in Laufzeit und Ressourcenverbrauch und ermöglichen so die Nutzbarmachung von erheblich größeren Datensätzen. Da die Anwendung solcher Algorithmen für fachfremde Nutzer nicht einfach ist, entwickeln wir zusätzlich das Programm Metanome zur intuitiven Datenanalyse. Metanome bietet dabei nicht nur die in dieser Arbeit vorgeschlagenen Algorithmen an, sondern auch Entdeckungsalgorithmen für andere Typen von Metadaten. Am Anwendungsfall der Schema-Normalisierung demonstrieren wir schließlich, wie die gefundenen Metadaten effektiv genutzt werden können.
- TextdokumentDatabase-Supported Video Game Engines: Data-Driven Map Generation(BTW 2019, 2019) O'Grady, DanielVideo game engines can benefit greatly from being tightly coupled with database systems. To make this point and exemplify the similarities in database and game engine technology, we demonstrate a data-driven approach to generate maps for video games, expressed purely in SQL. The demonstration will feature such a live database-supported game that is playable on-site.
- TextdokumentDICE: Density-based Interactive Clustering and Exploration(BTW 2019, 2019) Kazempour, Daniyal; Kazakov, Maksim; Kröger, Peer; Seidl, ThomasClustering algorithms are mostly following the pipeline to provide input data, and hyperparameter values. Then the algorithms are executed and the output files are generated or visualized. We provide in our work an early prototype of an interactive density-based clustering tool named DICE in which the users can change the hyperparameter settings and immediately observe the resulting clusters. Further the users can browse through each of the single detected clusters and get statistics regarding as well as a convex hull profile for each cluster. Further DICE keeps track of the chosen settings, enabling the user to review which hyperparameter values have been previously chosen. DICE can not only be used in scientific context of analyzing data, but also in didactic settings in which students can learn in an exploratory fashion how a density-based clustering algorithm like e.g. DBSCAN behaves.