Auflistung nach Schlagwort "Data Integration"
1 - 4 von 4
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragCombining the Concepts of Semantic Data Integration and Edge Computing(INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft, 2019) Farnbauer-Schmidt, Matthias; Lindner, Julian; Kaffenberger, Christopher; Albrecht, JensThe Internet of Things (IoT) is growing rapidly. Therefore, there are more and more vendors, which led to IoT being a heterogeneous collection of different IoT platforms, isolated solutions and several protocols. It has been proposed to use Data Integration to overcome this heterogeneity. In addition, costs are on the raise due to increasing volume of data which increases demands on bandwidth and cloud computing capabilities. Again a solution has already been proposed by reducing the amount of data to forward by processing data at the edge of an IoT-System, e. g. filtering or aggregation. This concept is called Edge Computing. In this article the Semantic Edge Computing Runtime (SECR) is introduced, combining both concepts. The application of Data Integration enables Edge Computing to be performed on a higher level of abstraction. In addition, the developed Driver-approach allows SECR’s Data Integration algorithm to be applied to a wide range of data sources without imposing requirements on them. The Data Integration itself is based on technologies of Semantic Web, applying metadata to raw data giving it context for interpretation. Furthermore, SECR’s REST-API enables applications to alternate Data Integration and Edge Computing at runtime. The tests of SECR’s prototype implementation have shown its suitability for deployment on an edge device and its scalability, being able to handle 128 data sources and Edge Computing Tasks.
- TextdokumentGraph Sampling with Distributed In-Memory Dataflow Systems(BTW 2021, 2021) Gomez, Kevin; Täschner, Matthias; Rostami, M. Ali; Rost, Christopher; Rahm, ErhardGiven a large graph, graph sampling determines a subgraph with similar characteristics for certain metrics of the original graph. The samples are much smaller thereby accelerating and simplifying the analysis and visualization of large graphs. We focus on the implementation of distributed graph sampling for Big Data frameworks and in-memory dataflow systems such as Apache Spark or Apache Flink and evaluate the scalability of the new implementations. The presented methods will be open source and be integrated into Gradoop, a system for distributed graph analytics.
- KonferenzbeitragMetadata Management for Data Integration in Medical Sciences(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Kirsten, Toralf; Kiel, Alexander; Rühle, Mathias; Wagner, JonasClinical and epidemiological studies are commonly used in medical sciences. They typically collect data by using different input forms and information systems. Metadata describing input forms, database schemas and input systems are used for data integration but are typically distributed over different software tools; each uses portions of metadata, such as for loading (ETL), data presentation and analysis. In this paper, we describe an approach managing metadata centrally and consistently in a dedicated Metadata Repository (MDR). Metadata can be provided to different tools. Moreover, the MDR includes a matching component creating schema mappings as a prerequisite to integrate captured medical data. We describe the approach, the MDR infrastructure and provide algorithms for creating schema mappings. Finally, we show selected evaluation results. The MDR is fully operational and used to integrate data from a multitude of input forms and systems in the epidemiological study LIFE.
- KonferenzbeitragPiPa: custom integration of protein interactions and pathways(INFORMATIK 2011 – Informatik schafft Communities, 2011) Arzt, Sebastian; Starlinger, Johannes; Arnold, Oliver; Kröger, Stefan; Jaeger, Samira; Leser, UlfInformation about proteins and their relationships to each other are a common source of input for many areas of Systems Biology, such as protein function prediction, relevance-ranking of disease genes and simulation of biological networks. While there are numerous databases that focus on collecting such data from, for instance, literature curation, expert knowledge, or experimental studies, their individual coverage is often low, making the building of an integrated protein-protein interaction database a pressing need. Accordingly, a number of such systems have emerged. But in most cases their content is only accessible over the web on a per-protein basis, which renders them useless for automatic analysis of sets of proteins. Even if the databases are available for download, often certain data sources are missing (e.g. because redistribution is forbidden by license), and update intervals are sporadic. We present PiPa, a system for the integration of protein-protein interactions (PPI) and pathway data. PiPa is a stand-alone tool for loading and updating a large number of common PPI and pathway databases into a homogeneously structured relational database. PiPa features a graphical administration tool for monitoring its state, triggering updates, and for computing statistics on the content. Due to its modular architecture, addition of new data sources is easy. The software is freely available from the authors.