Auflistung nach Autor:in "Binnig, Carsten"
1 - 10 von 13
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragBenchmarking the Second Generation of Intel SGX for Machine Learning Workloads(BTW 2023, 2023) Lutsch, Adrian; Singh, Gagandeep; Mundt, Martin; Mogk, Ragnar; Binnig, CarstenFor domains with high data privacy and protection demands, such as health care and finance, outsourcing machine learning tasks often requires additional security measures. Trusted Execution Environments like Intel SGX are a powerful tool to achieve this additional security. Until recently, Intel SGX incurred high performance costs, mainly because it was severely limited in terms of available memory and CPUs. With the second generation of SGX, Intel alleviates these problems. Therefore, we revisit previous use cases for ML secured by SGX and show initial results of a performance study for ML workloads on SGXv2.
- ZeitschriftenartikelDie SIKOSA-Methodik(Wirtschaftsinformatik: Vol. 49, No. 3, 2007) Weiß, Daniel; Kaack, Jörn; Kirn, Stefan; Gilliot, Maike; Lowis, Lutz; Müller, Günter; Herrmann, Andrea; Binnig, Carsten; Illes, Timea; Paech, Barbara; Kossmann, DonaldKernpunkteDie SIKOSA-Methodik beschreibt einen durchgängigen Lösungsansatz zur Softwareentwicklung.Die Methodik überführt funktionale und nicht funktionale Prozessanforderungen in entsprechende testbare Unternehmenssoftware-Anforderungen und erhöht die Validität.Die Methodik wird vor dem Hintergrund eines industriellen Beschaffungsprozesses veranschaulicht.AbstractThe SIKOSA method addresses the consistency of quality assuring methods in software engineering processes. To support an industrial business software production a new consistent and quality-oriented method was developed. Regarding business processes, which have to be supported, functional and non-functional requirements of business software are specified and transferred into automated test cases and test data.
- TextdokumentDPI: The Data Processing Interface for Modern Networks (Extended Abstract)(BTW 2019 – Workshopband, 2019) Binnig, Carsten
- TextdokumentGenerierung Relevanter Testdatenbanken(Ausgezeichnete Informatikdissertationen 2008, 2009) Binnig, CarstenIn heutigen Softwareentwicklungsprojekten ist das Testen eine der kosten- und zeitintensivsten Tätigkeiten. Wie ein aktueller Bericht des NIST [RTI02] zeigt, verursachten Softwarefehler in den USA im Jahr 2000 zwischen 22, 2 und 59, 5 Milliarden Dollar an Kosten. Demzufolge wurden in den letzten Jahren verschiedene Methoden und Werkzeuge entwickelt, um diese hohen Kosten zu reduzieren. Viele dieser Werkzeuge dienen dazu die verschiedenen Testaufgaben (z.B. das Erzeugen von Testfällen, die Ausführung von Testfällen und das Überprüfen der Testergebnisse) zu automatisieren. Jedoch existieren nur wenige Forschungsarbeiten zur Automatisierung der Tests von Datenbankanwendungen (wie z.B. eines E-Shops) bzw. von relationalen Datenbankmanagementsystemen (DBMS). Die diesem Artikel zugrunde liegende Doktorarbeit diskutiert ein wichtiges Problem aus diesem Bereich: Die Generierung von Testdatenbanken. Zur Erzeugung einer Testdatenbank existieren verschiedene Forschungsprototypen sowie auch kommerzielle Datenbankgeneratoren. Jedoch sind dies meist Universallösungen, welche die Testdatenbanken unabhängig von den auszuführenden Testfällen erzeugen. Demzufolge weisen die generierten Testdatenbanken meist nicht die notwendigen Datencharakteristika auf, die zur Ausführung bestimmter Testfälle notwendig sind. Im Gegensatz zu diesen Werkzeugen beschreibt dieser Artikel innovative Ansätze, die zur Generierung von relevanten Testdatenbanken dienen. Die generelle Idee ist, dass der Benutzer explizit für einen oder mehrere Testfälle die notwendigen Bedingungen an die Testdaten deklarativ formulieren kann. Diese Bedingungen werden dann dazu genutzt, um eine Testdatenbank zu generieren, die die gewünschten Datencharakteristika aufweist, welche zur Ausführung der Testfälle notwendig sind.
- KonferenzbeitragIncMap: A Journey towards Ontology-based Data Integration(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Pinkel, Christoph; Binnig, Carsten; Jimenez-Ruiz, Ernesto; Kharlamov, Evgeny; Nikolov, Andriy; Schwarte, Andreas; Heupel, Christian; Kraska, TimOntology-based data integration (OBDI) allows users to federate over heterogeneous data sources using a semantic rich conceptual data model. An important challenge in ODBI is the curation of mappings between the data sources and the global ontology. In the last years, we have built IncMap, a system to semi-automatically create mappings between relational data sources and a global ontology. IncMap has since been put into practice, both for academic and in industrial applications. Based on the experience of the last years, we have extended the original version of IncMap in several dimensions to enhance the mapping quality: (1) IncMap can detect and leverage semantic-rich patterns in the relational data sources such as inheritance for the mapping creation. (2) IncMap is able to leverage reasoning rules in the ontology to overcome structural differences from the relational data sources. (3) IncMap now includes a fully automatic mode that is often necessary to bootstrap mappings for a new data source. Our experimental evaluation shows that the new version of IncMap outperforms its previous version as well as other state-of-the-art systems.
- ZeitschriftenartikelRDMA Communciation Patterns(Datenbank-Spektrum: Vol. 20, No. 3, 2020) Ziegler, Tobias; Leis, Viktor; Binnig, CarstenRemote Direct Memory Access (RDMA) is a networking protocol that provides high bandwidth and low latency accesses to a remote node’s main memory. Although there has been much work around RDMA, such as building libraries on top of RDMA or even applications leveraging RDMA, it remains a hard problem to identify the most suitable RDMA primitives and their combination for a given problem. While there have been some initial studies included in papers that aim to investigate selected performance characteristics of particular design choices, there has not been a systematic study to evaluate the communication patterns of scale-out systems. In this paper, we address this issue by systematically investigating how to efficiently use RDMA for building scale-out systems.
- JournalScalable Data Management on Modern Networks(Datenbank-Spektrum: Vol. 18, No. 3, 2018) Binnig, Carsten
- TextdokumentSkew-resilient Query Processing for Fast Networks(BTW 2019 – Workshopband, 2019) Ziegler, Tobias; Binnig, Carsten; Röhm, UweMotivation: Scalable distributed in-memory databases are at the core of data-intensive computation. Although scaling-out solutions help to handle large amounts of data, more nodes do not necessarily lead to improved query performance. In fact, recent papers have shown that performance can even degrade when scaling out due to higher communication overhead (e.g., shuffling data across nodes) and limited bandwidth [Rö15]. Thus, current distributed database systems are built with the assumption that the network is the major bottleneck [BH13] and should be avoided at all costs. In recent years, high-speed networks (e.g., InfiniBand (IB)) with a bandwidth close to the local memory bus [Bi16] have become economically viable. These network technologies provide Remote Direct Memory Access (RDMA) to allow direct memory access to a remote host and also reduce the latency of data transfer through bypassing the remote’s CPU [In17, Gr10]. Therefore, the assumption that the network is the bottleneck no longer holds. Consequently, recent research has focused on integrating RDMA-enabled high-speed networks into existing database systems designed along a Shared-Nothing Architecture (SN) [Rö16, LYB17]. This architecture co-locates computation and data to reduce the communication overhead in a cluster. Although combining a SN with IB’s higher network bandwidth enables scalability to a certain extent, this approach fails if the data or workload is skewed and cannot be evenly partitioned. The root cause is that classical query execution schemes assume that each partition is processed by one node. Since nodes with larger partitions must process more data, they may become a bottleneck and hinder the overall scalability. In consequence, only utilizing the higher bandwidth without adapting the database architecture and query execution, does not automatically lead to improved scalability [Bi16]. Contributions: In this paper, we present a new approach to execute distributed queries on fast networks with RDMA. Our main contribution is a novel execution strategy, which enables collaborative query processing by remote work stealing to mitigate skew, as this is a common issues that hinders scalable query execution [WDJ91, Ly88]. Moreover, we implement this execution strategy in our prototype engine I-Store and show that it introduces almost no overhead to handle skew.
- KonferenzbeitragSportsTables: A new Corpus for Semantic Type Detection(BTW 2023, 2023) Langenecker, Sven; Sturm, Christoph; Schalles, Christian; Binnig, CarstenTable corpora such as VizNet or TURL which contain annotated semantic types per column are important to build machine learning models for the task of automatic semantic type detection. However, there is a huge discrepancy between corpora that are used for training and testing since real-world data lakes contain a huge fraction of numerical data which are not present in existing corpora. Hence, in this paper, we introduce a new corpus that contains a much higher proportion of numerical columns than existing corpora. To reflect the distribution in real-world data lakes, our corpus SportsTables has on average approx. 86% numerical columns, posing new challenges to existing semantic type detection models which have mainly targeted non-numerical columns so far. To demonstrate this effect, we show the results of a first study using a state-of-the-art approach for semantic type detection on our new corpus and demonstrate significant performance differences in predicting semantic types for textual and numerical data.
- KonferenzbeitragSpotlytics: How to Use Cloud Market Places for Analytics?(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Kraska, Tim; Dadashov, Elkhan; Binnig, CarstenIn contrast to fixed-priced cloud computing services, Amazon’s Spot market uses a demand-driven pricing model for renting out virtual machine instances. This allows for remarkable savings when used intelligently. However, a peculiarity of Amazon’s Spot market is, that machines can suddenly be taken away from the user if the price on the market increases. This can be considered as a distinct form of a machine failure. In this paper, we first analyze Amazon’s current spot market rules and based on the results develop a general market model. This model is valid for Amazon’s current Spot service but also many potential variations of it, as well as other cloud computing markets. Using the developed market model, we then make recommendations on how to deploy analytical systems with the following three fault-tolerance/recovery strategies: re-execution as used by traditional database systems, checkpointing as, for example, used by Hadoop, and lineage-based recovery as, for example, used by Spark. The main insights are that for traditional database systems using significantly more instances/machines can be cheaper, whereas for systems with checkpoint recovery the opposite is true, while lineage-based recovery is not beneficial for cloud markets at all.