Auflistung nach Autor:in "Paradies, Marcus"
1 - 5 von 5
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelAn Efficient Blocking Technique for Reference Matching using MapReduce(Datenbank-Spektrum: Vol. 11, No. 1, 2011) Paradies, MarcusDocument Clustering has become an increasingly important task in the area of data mining and information retrieval. With growing data volumes, CPU—and memory-efficient techniques for clustering algorithms are receiving considerable attention in the research community. To deal with huge amounts of data (e.g., documents from Wikipedia or CiteSeerX which are several GB in size), distributed clustering techniques have been designed to provide scalable and flexible approaches. We study the problem of document clustering in the area of Entity Matching, where documents from various data sources are matched together. More specifically, we focus on a common optimization technique called blocking which reduces the enormous search space by clustering the data sources into smaller groups and processes comparisons only within a group. In this article, we describe our experiences and findings in applying the MapReduce framework to deal with huge bibliographic data sets and to provide a flexible, scalable and easy-to-use blocking technique to reduce the search space for Entity Matching.
- ZeitschriftenartikelBig Graph Data Analytics on Single Machines – An Overview(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Paradies, Marcus; Voigt, HannesDriven by a multitude of use cases, graph data analytics has become a hot topic in research and industry. Particularly on big graphs, performing complex analytical queries efficiently to derive new insights is a challenging task. Systems that aim at solving the technical part of this challenge are often referred to as graph processing systems. They allow expressing and executing analytic algorithms and queries, while hiding most of the technical details related to efficiently storing and processing graph data. Since 2010, work on graph processing systems for distributed systems as well as shared memory systems has virtually exploded. In this article, we give an overview of this work with the particular focus on graph processing systems for large multiprocessor machines. We describe the state of the art established in recent years and outline trends and challenges in research and development that point towards the future of graph processing systems.
- ZeitschriftenartikelEditorial(Datenbank-Spektrum: Vol. 17, No. 2, 2017) Voigt, Hannes; Paradies, Marcus; Härder, Theo
- KonferenzbeitragThe graph story of the SAP HANA database(Datenbanksysteme für Business, Technologie und Web (BTW) 2037, 2013) Rudolf, Michael; Paradies, Marcus; Bornhövd, Christof; Lehner, WolfgangMany traditional and new business applications work with inherently graphstructured data and therefore benefit from graph abstractions and operations provided in the data management layer. The property graph data model not only offers schema flexibility but also permits managing and processing data and metadata jointly. By having typical graph operations implemented directly in the database engine and exposing them both in the form of an intuitive programming interface and a declarative language, complex business application logic can be expressed more easily and executed very efficiently. In this paper we describe our ongoing work to extend the SAP HANA database with built-in graph data support. We see this as a next step on the way to provide an efficient and intuitive data management platform for modern business applications with SAP HANA.
- KonferenzbeitragMeduse: Interactive and Visual Exploration of Ionospheric Data(BTW 2023, 2023) Reibert, Joshua; Osterthun, Arne; Paradies, MarcusSpatio-temporal models of ionospheric data are important for atmospheric research and the evaluation of their impact on satellite communications. However, researchers lack tools to visually and interactively analyze these rapidly growing multi-dimensional datasets that cannot be entirely loaded into main memory. Existing tools for large-scale multi-dimensional scientific data visualization and exploration rely on slow, file-based data management support and simplistic client-server interaction that fetches all data to the client side for rendering.In this paper we present our data management and interactive data exploration and visualization system MEDUSE. We demonstrate the initial implementation of the interactive data exploration and visualization component that enables domain scientists to visualize and interactively explore multi-dimensional ionospheric data. Use-case-specific visualizations additionally allow the analysis of such data along satellite trajectories to accommodate domain-specific analyses of the impact on data collected by satellites such as for global navigation satellite systems and earth observation.