Auflistung nach Autor:in "Naumann, Felix"
1 - 10 von 67
Treffer pro Seite
Sortieroptionen
- TextdokumentArchitectural Principles for Database Systems on Storage-Class Memory(BTW 2019, 2019) Oukid, IsmailStorage-Class Memory (SCM) is a novel class of memory technologies that combine the byte addressability and low latency of DRAM with the density and non-volatility of traditional storage media. Hence, SCM can serve as persistent main memory, i.e., as main memory and storage at the same time. In this thesis, we dissect the challenges and pursue the opportunities brought by SCM to database systems. To solve the identified challenges, we devise necessary building blocks for enabling SCM-based database systems, namely memory management, data structures, transaction concurrency control, recovery techniques, and a testing framework against new failure scenarios stemming from SCM. Thereafter, we leverage these building blocks to build SOFORT, a novel hybrid SCM-DRAM transactional storage engine that places data, accesses it, and updates it directly in SCM, thereby doing away with traditional write-ahead logging and achieving near-instant recovery.
- TextdokumentThe Best of Both Worlds: Combining Hand-Tuned and Word-Embedding-Based Similarity Measures for Entity Resolution(BTW 2019, 2019) Chen, Xiao; Campero Durand, Gabriel; Zoun, Roman; Broneske, David; Li, Yang; Saake, GunterRecently word embedding has become a beneficial technique for diverse natural language processing tasks, especially after the successful introduction of several popular neural word embedding models, such as word2vec, GloVe, and FastText. Also entity resolution, i.e., the task of identifying digital records that refer to the same real-world entity, has been shown to benefit from word embedding. However, the use of word embeddings does not lead to a one-size-fits-all solution, because it cannot provide an accurate result for those values without any semantic meaning, such as numerical values. In this paper, we propose to use the combination of general word embedding with traditional hand-picked similarity measures for solving ER tasks, which aims to select the most suitable similarity measure for each attribute based on its property. We provide some guidelines on how to choose suitable similarity measures for different types of attributes and evaluate our proposed hybrid method on both synthetic and real datasets. Experiments show that a hybrid method reliant on correctly selecting required similarity measures can outperform the method of purely adopting traditional or word-embedding-based similarity measures.
- TextdokumentBig graph analysis by visually created workflows(BTW 2019, 2019) Rostami, M. Ali; Peukert, Eric; Wilke, Moritz; Rahm, ErhardThe analysis of large graphs has received considerable attention recently but current solutions are typically hard to use. In this demonstration paper, we report on an effort to improve the usability of the open-source system Gradoop for processing and analyzing large graphs. This is achieved by integrating Gradoop into the popular open-source software KNIME to visually create graph analysis workflows, without the need for coding. We outline the integration approach and discuss what will be demonstrated.
- TextdokumentBlockchain in the Context of Business Applications and Enterprise Databases(BTW 2019, 2019) Renkes, Frank; Sommer, ChristianBlockchain seems to be the future of all cross-company business applications. Similar to the adoption of machine learning into all novel and existing business applications and processes we can see the same trend for blockchain. Nearly every application tries to leverage blockchain technology to improve the application related process chains. Is this just a hype or is blockchain really the solution to all problems, in which applications rely on an intelligent and secure data distribution / sharing? What are the most relevant qualities of blockchain needed in modern business applications and which role can a traditional database play in this? Wouldn’t be an integration of some of the qualities into traditional databases a better approach to build the so called ‘distributed business applications’? What is the relationship and overlap between core blockchain and core database concepts like (redo) logging, security features like auditing and encryption, distributed (query) processing, as well as stored procedures/smart contracts? This talk discusses how blockchain can be integrated into existing business applications and processes, what the biggest challenges are and which role a traditional database can play in this context.
- TextdokumentThe Borda Social Choice Movie Recommender(BTW 2019, 2019) Kastner, Johannes; Ranitovic, Nemanja; Endres, MarkusIn this demo paper we present a recommender system, which exploits the Borda social choice voting rule for clustering recommendations in order to produce comprehensible results for a user. Considering existing clustering techniques like k-means, the overhead of normalizing and preparing the preferred user data is dropped. In our demo showcase we facilitate a comparison of our clustering approach to the well known k-means++ with traditional distance measures.
- TextdokumentBTW2019 - Datenbanksysteme für Business, Technologie und Web(BTW 2019, 2019) Grust, Torsten; Naumann, Felix; Böhm, Alexander; Lehner, Wolfgang; Härder, Theo; Rahm, Erhard; Heuer, Andreas; Klettke, Meike; Meyer, Holger
- TextdokumentBuilding Scalable Machine Learning Solutions for Data Cleaning(BTW 2019, 2019) Ilyas, IhabMachine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. In this talk I discuss why leveraging data semantics and domain-specific knowledge is key in delivering the optimizations necessary for truly scalable ML curation solutions. The talk focuses on two main problems: (1) entity consolidation, which is arguably the most difficult data curation challenge because it is notoriously complex and hard to scale; and (2) using probabilistic inference to suggest data repair for identified errors and anomalies using our new system called HoloClean. Both problems have been challenging researchers and practitioners for decades due to the fundamentally combinatorial explosion in the space of solutions and the lack of ground truth. There’s a large body of work on this problem by both academia and industry. Techniques have included human curation, rules-based systems, and automatic discovery of clusters using predefined thresholds on record similarity Unfortunately, none of these techniques alone has been able to provide sufficient accuracy and scalability. The talk aims at providing deeper insight into the entity consolidation and data repair problems and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution.
- KonferenzbeitragA Classification of Schema Mappings and Analysis of Mapping Tools(Datenbanksysteme in Business, Technologie und Web (BTW 2007) – 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 2007) Legler, Frank; Naumann, FelixSchema mapping techniques for data exchange have become popular and useful tools both in research and industry. A schema mapping relates a source schema with a target schema via correspondences, which are specified by a domain expert possibly supported by automated schema matching algorithms. The set of correspondences, i.e., the mapping, is interpreted as a data transformation usually expressed as a query. These queries transform data from the source schema to conform to the target schema. They can be used to materialize data at the target or used as views in a virtually integrated system. We present a classification of mapping situations that can occur when mapping between two relational or nested (XML) schemata. Our classification takes into con- sideration 1:1 and n:m correspondences, attribute-level and higher-level mappings, and special constructs, such as choice constraints, cardinality constraints, and data types. Based on this classification, we have developed a general suite of schemata, data, and correspondences to test the ability of tools to cope with the different mapping situations. We evaluated several commercial and research tools that support the definition of schema mappings and interpret this mapping as a data transformation. We found that no tool performs well in all mapping situations and that many tools produce incorrect data transformations. The test suite can serve as a benchmark for future improvements and developments of schema mapping tools.
- ZeitschriftenartikelDas Fachgebiet „Informationssysteme“ am Hasso-Plattner-Institut(Datenbank-Spektrum: Vol. 17, No. 1, 2017) Naumann, Felix; Krestel, RalfDas Hasso-Plattner-Institut (HPI) ist ein privat finanziertes Institut an der Universität Potsdam. Stifter ist Professor Hasso Plattner, Mitgründer und Aufsichtsratsvorsitzender des Software-Konzerns SAP. Das Fachgebiet Informationssysteme, das von Prof. Dr. Felix Naumann geleitet wird, beschäftigt sich mit dem effizienten und effektiven Umgang mit heterogenen Daten und Texten. Gegründet wurde das Fachgebiet 2006 und bietet derzeit 12 Doktoranden und circa 15 Masterstudenten eine Forschungsumgebung.
- JournalData Change Exploration Using Time Series Clustering(Datenbank-Spektrum: Vol. 18, No. 2, 2018) Bornemann, Leon; Bleifuß, Tobias; Kalashnikov, Dmitri; Naumann, Felix; Srivastava, Divesh