GI Digital Library :: Auflistung nach Schlagwort "semi-structured data"

Auflistung nach Schlagwort "semi-structured data"

1 - 2 von 2

Konferenzbeitrag
Converting data organised for visual perception into machine-readable formats
(44. GIL - Jahrestagung, Biodiversität fördern durch digitale Landwirtschaft, 2024) Aue, Alexander; Ackermann, Andrea; Röder, Norbert
Spreadsheets are used to store an extraordinary amount of important data. The fact that spreadsheets are both easy to use and allow users a great deal of flexibility in how they store their data is a significant reason why they are so popular. Users often use a variety of layout techniques to make the data easy for humans to understand. But this layout also creates problems for traditional Extract-Transform-Load (ETL) tools. We propose a program that allows users to easily extract data from Excel files by selecting the cells containing the data and metadata thereby determining the data hierarchy. We have used this program to extract data of the Agricultural Structure Survey on land use and livestock in Germany, which does not follow a nationwide standard, leading to large differences in the structuring of the data between the federal states, making it a good benchmark.
Konferenzbeitrag
An FPGA Avro Parser Generator for Accelerated Data Stream Processing
(BTW 2023, 2023) Hahn, Tobias; Schüll, Daniel; Wildermann, Stefan; Teich, Jürgen
Big Data applications frequently involve processing data streams encoded in semi-structured data formats such as JSON, Protobuf, or Avro.A major challenge in accelerating data stream processing on FPGAs is that the parsing of such data formats is usually highly complex.This is especially true for JSON parsing on FPGAs, which lies in the focus of related work.The parsing of the binary Avro format, on the other hand, is perfectly suited for being processed on FPGAs and can thus serve as an enabler for data stream processing on FPGAs.In this realm, we present a methodology for parsing, projection, and selection of Avro objects, which enforces an output format suitable for further processing on the FPGA.Moreover, we provide a generator to automatically create accelerators based on this methodology.The obtained accelerators can achieve significant speedups compared to CPU-based parsers, and at the same time require only very few FPGA resources.

Auflistung nach Schlagwort "semi-structured data"

Treffer pro Seite

Sortieroptionen