P331 - BTW2023- Datenbanksysteme für Business, Technologie und Web
Auflistung P331 - BTW2023- Datenbanksysteme für Business, Technologie und Web nach Titel
1 - 10 von 80
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragAccelerating Large Table Scan using Processing-In-Memory Technology(BTW 2023, 2023) Baumstark, Alexander; Jibril, Muhammad Attahir; Sattler, Kai-UweToday’s systems are capable of storing large amounts of data in main memory. In-memoryDBMSs can benefit particularly from this development. However, the processing of the data fromthe main memory necessarily has to run via the CPU. This creates a bottleneck, which affects thepossible performance of the DBMS. The Processing-In-Memory (PIM) technology is a paradigm toovercome this problem, which was not available in commercial systems for a long time. However, withthe availability of UPMEM, a commercial system is finally available that provides PIM technologyin hardware. In this work, the main focus was on the optimization of the table scan, a fundamental,and memory-bound operation. Here a possible approach is shown, which can be used to optimizethis operation by using PIM. This method was then tested for parallelism and execution time inbenchmarks with different table sizes and compared to the usual table scan. The result is a table scanthat outperforms the scan on the usual CPU significantly.
- KonferenzbeitragAdaptive Architectures for Robust Data Management Systems(BTW 2023, 2023) Bang, TiemoForm follows function is a well-known expression by the architect Sullivan asserting that the architecture of a building should follow its function. 'Adaptive Architectures for Robust Data Management Systems' is a dissertation asserting that DBMS architectures should follow changing workload and hardware to robustly achieve high DBMS performance. The dissertation first evaluates how workload and hardware affect the performance of DBMSs with static architectures. This evaluation concludes that static DBMS architectures degrade DBMS performance under changing workload and hardware, and hence the DBMS architecture has to become adaptive. Subsequently, adaptation concepts for the architecture of single-server and multi-server DBMSs are proposed. These concepts focus fine-grained adaptation of DBMS architectures and are realized through asynchronous programming models. These programming models decouple the implementation of DBMS components from fine-grained architectural optimization. Thereby, optimizers can derive novel architectures better fitting individual DBMS components, leading to high and robust DBMS performance under changing conditions.
- KonferenzbeitragApproach to Synthetic Data Generation for Imbalanced Multi-class Problems with Heterogeneous Groups(BTW 2023, 2023) Treder-Tschechlov, Dennis; Reimann, Peter; Schwarz, Holger; Mitschang, BernhardTo benchmark novel classification algorithms, these algorithms should be evaluated on data with characteristics that also appear in real-world use cases. Important data characteristics that often lead to challenges for classification approaches are multi-class imbalance and heterogeneous groups. Real-world data that comprise these characteristics are usually not publicly available, e. g., because they constitute sensible patient information or due to privacy concerns. Further, the manifestations of the characteristics cannot be controlled specifically on real-world data. A more rigorous approach is to synthetically generate data such that different manifestations of the characteristics can be controlled. However, existing data generators are not able to generate data that feature both data characteristics, i. e., multi-class imbalance and heterogeneous groups. In this paper, we propose an approach that fills this gap as it allows to synthetically generate data that exhibit both characteristics. In particular, we make use of a taxonomy model that organizes real-world entities in domain-specific heterogeneous groups to generate data reflecting the characteristics of these groups. In addition, we incorporate probability distributions to reflect the imbalances of multiple classes and groups from real-world use cases. Our approach is applicable in different domains, as taxonomies are the simplest form of knowledge models and thus are available in many domains. The evaluation shows that our approach can generate data that feature the data characteristics multi-class imbalance and heterogeneous groups and that it allows to control different manifestations of these characteristics.
- KonferenzbeitragAutomated Statement Extraction from Press Briefings(BTW 2023, 2023) Keller, Jüri; Bittkowski, Meik; Schaer, PhilippScientific press briefings are a valuable information source. They consist of alternating expert speeches, questions from the audience and their answers. Therefore, they can contribute to scientific and fact-based media coverage. Even though press briefings are highly informative, extracting statements relevant to individual journalistic tasks is challenging and time-consuming.To support this task, an automated statement extraction system is proposed. Claims are used as the main feature to identify statements in press briefing transcripts. The statement extraction task is formulated as a four-step procedure. First, the press briefings are split into sentences and passages, then claim sentences are identified with a single-label multi-class sequence classification. Subsequently, topics are detected, and the sentences are filtered to improve the coherence and assess the length of the statements.The results indicate that claim detection can be used to identify statements in press briefings. While many statements can be extracted automatically with this system, they are not always as coherent as needed to be understood without context and may need further review by knowledgeable persons.
- KonferenzbeitragBenchmarking the Second Generation of Intel SGX for Machine Learning Workloads(BTW 2023, 2023) Lutsch, Adrian; Singh, Gagandeep; Mundt, Martin; Mogk, Ragnar; Binnig, CarstenFor domains with high data privacy and protection demands, such as health care and finance, outsourcing machine learning tasks often requires additional security measures. Trusted Execution Environments like Intel SGX are a powerful tool to achieve this additional security. Until recently, Intel SGX incurred high performance costs, mainly because it was severely limited in terms of available memory and CPUs. With the second generation of SGX, Intel alleviates these problems. Therefore, we revisit previous use cases for ML secured by SGX and show initial results of a performance study for ML workloads on SGXv2.
- KonferenzbeitragBetter Safe than Sorry: Visualizing, Predicting, and Successfully Guiding Courses of Study(BTW 2023, 2023) Kerth, Alexander; Schuhknecht, Felix; Pensel, Lukas; Henneberg, JustusSuccessfully going through a course of study is a lengthy and challenging task. To obtain a degree, many obstacles must be overcome and the right decisions must be made at the right point in time, often overwhelming students. To reduce the amount of dropouts, the goal of study advisors is to reach out to endangered students in time and to provide them help and guidance. To support the work of study advisors, who typically have to monitor a large amount of students simultaneously, we present in this demonstration an easy-to-use graphical tool that (a) allows the advisor to visualize all relevant information of study data in a responsive graph in order to overview the current study situation. Additional to visualization, our tool provides (b) a forecasting functionality based on pre-trained models and (c) a warning feature to identify endangered students early on. In the on-site demonstration, the audience will be able to step into the role of a study advisor and use our tool and all of its features to identify and guide struggling students within anonymized real-world study data.
- KonferenzbeitragBTW 2023 - Complete proceedings(BTW 2023, 2023) Köhnen, Christoph
- KonferenzbeitragCLOCQ: A Toolkit for Fast and Easy Access to Knowledge Bases(BTW 2023, 2023) Christmann, Philipp; Roy, Rishiraj Saha; Weikum, GerhardCurated knowledge bases (KBs) store vast amounts of factual world knowledge, and are therefore ubiquitous in many information retrieval (IR) and natural language processing (NLP) applications like question answering, named entity disambiguation, or knowledge exploration. Despite that, accessing information from complete knowledge bases is often a daunting task. Researchers and practitioners typically have crisp use cases in mind, for which standard querying interfaces can be overly complex and inefficient. We aim to bridge this gap, and release a public toolkit that provides functionalities for common KB access use cases, and make it available via a public API. Experiments show efficiency improvements over existing KB interfaces for various important functionalities.
- KonferenzbeitragCommunication-Optimal Parallel Reservoir Sampling(BTW 2023, 2023) Winter, Christian; Sichert, Moritz; Birler, Altan; Neumann, Thomas; Kemper, AlfonsWhen evaluating complex analytical queries on high-velocity data streams, many systems cannot run those queries on all elements of a stream. Sampling is a widely used method to reduce the system load by replacing the input with a representative yet manageable subset. For unbounded data, reservoir sampling generates a fixed-size uniform sample independent of the input cardinality. However, the collection of reservoir samples itself can already be a bottleneck for high-velocity data.In this paper, we introduce a technique that allows fully parallelizing reservoir sampling for many-core architectures. Our approach relies on the efficient combination of thread-local samples taken over chunks of the input without necessitating communication during the sampling phase and with minimal communication when merging. We show how our efficient merge guarantees uniform random samples while allowing data to be distributed over worker threads arbitrarily. Our analysis of this approach within the Umbra database system demonstrates linear scaling along the available threads and the ability to sustain high-velocity workloads.
- KonferenzbeitragA Core Ontology to Support Agricultural Data Interoperability(BTW 2023, 2023) Abdelmageed, Aly; Hatem, Shahenda; ael, Tasneem; Medhat, Walaa; König-Ries, Birgitta; Ellakwa, Susan; Elkafrawy, Passent; Algergawy, AlsayedThe amount and variety of raw data generated in the agriculture sector from numeroussources, including soil sensors and local weather stations, are proliferating. However, these raw data in themselves are meaningless and isolated and, therefore, may offer little value to the farmer. Data usefulness is determined by its context and meaning and by how it is interoperable with data from other sources. Semantic web technology can provide context and meaning to data and its aggregation by providing standard data interchange formats and description languages. In this paper, we introduce the design and overall description of a core ontology that facilitates the process of data interoperability in the agricultural domain.