Auflistung nach Schlagwort "Distributed Computing"
1 - 4 von 4
Treffer pro Seite
Sortieroptionen
- TextdokumentAn Actor Database System for Akka(BTW 2019 – Workshopband, 2019) Schmidl, Sebastian; Schneider, Frederic; Papenbrock, ThorstenSystem architectures for data-centric applications are commonly comprised of two tiers: An application tier and a data tier. The fact that these tiers do not typically share a common format for data is referred to as object-relational impedance mismatch. To mitigate this, we develop an actor database system that enables the implementation of application logic into the data storage runtime. The actor model also allows for easy distribution of both data and computation across multiple nodes in a cluster. More specifically, we propose the concept of domain actors that provide a type-safe, SQL-like interface to develop the actors of our database system and the concept of Functors to build queries retrieving data contained in multiple actor instances. Our experiments demonstrate the feasibility of encapsulating data into domain actors by evaluating their memory overhead and performance. We also discuss how our proposed actor database system framework solves some of the challenges that arise from the design of distributed databases such as data partitioning, failure handling, and concurrent query processing.
- KonferenzbeitragThe Big Picture: Understanding large-scale graphs using Graph Grouping with GRADOOP(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Junghanns, Martin; Petermann, André; Teichmann, Niklas; Rahm, ErhardGraph grouping supports data analysts in decision making based on the characteristics of large-scale, heterogeneous networks containing millions or even billions of vertices and edges. We demonstrate graph grouping with G , a scalable system supporting declarative programs composed from multiple graph operations. Using social network data, we highlight the analytical capabilities enabled by graph grouping in combination with other graph operators. The resulting graphs are visualized and visitors are invited to either modify existing or write new analytical programs. G is implemented on top of Apache Flink, a state-of-the-art distributed dataflow framework, and thus allows us to scale graph analytical programs across multiple machines. In the demonstration, programs can either be executed locally or remotely on our research cluster.
- KonferenzbeitragDistributed Grouping of Property Graphs with GRADOOP(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Junghanns, Martin; Petermann, André; Rahm, ErhardProperty graphs are an intuitive way to model, analyze and visualize complex relationships among heterogeneous data objects, for example, as they occur in social, biological and information networks. These graphs typically contain thousands or millions of vertices and edges and their entire representation can easily overwhelm an analyst. One way to reduce complexity is the grouping of vertices and edges to summary graphs. In this paper, we present an algorithm for graph grouping with support for attribute aggregation and structural summarization by user-defined vertex and edge properties. The algorithm is part of G , an open-source system for graph analytics. G is implemented on top of Apache Flink, a state-of-the-art distributed dataflow framework, and thus allows us to scale graph analytical programs across multiple machines. Our evaluation demonstrates the scalability of the algorithm on real-world and synthetic social network data.
- TextdokumentGraph Sampling with Distributed In-Memory Dataflow Systems(BTW 2021, 2021) Gomez, Kevin; Täschner, Matthias; Rostami, M. Ali; Rost, Christopher; Rahm, ErhardGiven a large graph, graph sampling determines a subgraph with similar characteristics for certain metrics of the original graph. The samples are much smaller thereby accelerating and simplifying the analysis and visualization of large graphs. We focus on the implementation of distributed graph sampling for Big Data frameworks and in-memory dataflow systems such as Apache Spark or Apache Flink and evaluate the scalability of the new implementations. The presented methods will be open source and be integrated into Gradoop, a system for distributed graph analytics.