Auflistung nach Autor:in "Markl, Volker"
1 - 10 von 70
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragApplying stratosphere for big data analytics(Datenbanksysteme für Business, Technologie und Web (BTW) 2046, 2013) Leich, Marcus; Adamek, Jochen; Schubotz, Moritz; Heise, Arvid; Rheinländer, Astrid; Markl, VolkerAnalyzing big data sets as they occur in modern business and science applications requires query languages that allow for the specification of complex data processing tasks. Moreover, these ideally declarative query specifications have to be optimized, parallelized and scheduled for processing on massively parallel data processing platforms. This paper demonstrates the application of Stratosphere to different kinds of Big Data Analytics tasks. Using examples from different application domains, we show how to formulate analytical tasks as Meteor queries and execute them with Stratosphere. These examples include data cleansing and information extraction tasks, and a correlation analysis of microblogging and stock trade volume data that we describe in detail in this paper.
- KonferenzbeitragA Bayesian approach to estimating the selectivity of conjunctive predicates(Datenbanksysteme in Business, Technologie und Web (BTW) – 13. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 2009) Heimel, Max; Markl, Volker; Murthy, KeshavaCost-based optimizers in relational databases make use of data statistics to estimate intermediate result cardinalities. Those cardinalities are needed to estimate access plan costs in order to choose the cheapest plan for executing a query. Since statist
- ZeitschriftenartikelThe Berlin Big Data Center (BBDC)(it - Information Technology: Vol. 60, No. 5-6, 2018) Boden, Christoph; Rabl, Tilmann; Markl, VolkerThe last decade has been characterized by the collection and availability of unprecedented amounts of data due to rapidly decreasing storage costs and the omnipresence of sensors and data-producing global online-services. In order to process and analyze this data deluge, novel distributed data processing systems resting on the paradigm of data flow such as Apache Hadoop, Apache Spark, or Apache Flink were built and have been scaled to tens of thousands of machines. However, writing efficient implementations of data analysis programs on these systems requires a deep understanding of systems programming, prohibiting large groups of data scientists and analysts from efficiently using this technology. In this article, we present some of the main achievements of the research carried out by the Berlin Big Data Cente (BBDC). We introduce the two domain-specific languages Emma and LARA, which are deeply embedded in Scala and enable declarative specification and the automatic parallelization of data analysis programs, the PEEL Framework for transparent and reproducible benchmark experiments of distributed data processing systems, approaches to foster the interpretability of machine learning models and finally provide an overview of the challenges to be addressed in the second phase of the BBDC.
- ZeitschriftenartikelBig Data(Wirtschaftsinformatik: Vol. 56, No. 5, 2014) Schermann, Michael; Hemsen, Holmer; Buchmüller, Christoph; Bitter, Till; Krcmar, Helmut; Markl, Volker; Hoeren, ThomasMit “Big Data” werden Technologien beschrieben, die nicht weniger als die Erfüllung eines der Kernziele der Wirtschaftsinformatik versprechen: die richtigen Informationen dem richtigen Adressaten zur richtigen Zeit in der richtigen Menge am richtigen Ort und in der erforderlichen Qualität bereitzustellen. Für die Wirtschaftsinformatik als anwendungsorientierte Wissenschaftsdisziplin entstehen durch solche technologischen Entwicklungen Chancen und Risiken. Risiken entstehen vor allem dadurch, dass möglicherweise erhebliche Ressourcen auf die Erklärung und Gestaltung von Modeerscheinungen verwendet werden. Chancen entstehen dadurch, dass die entsprechenden Ressourcen zu substanziellen Erkenntnisgewinnen führen, die dem wissenschaftlichen Fortschritt der Disziplin wie auch ihrer praktischen Relevanz dienen.Aus Sicht der Autoren ist die Wirtschaftsinformatik ideal positioniert, um Big Data kritisch zu begleiten und Erkenntnisse für die Erklärung und Gestaltung innovativer Informationssysteme in Wirtschaft und Verwaltung zu nutzen – unabhängig davon, ob Big Data nun tatsächlich eine disruptive Technologie oder doch nur eine flüchtige Modeerscheinung ist. Die weitere Entwicklung und Adoption von Big Data wird letztendlich zeigen, ob es sich um eine Modeerscheinung oder um substanziellen Fortschritt handelt. Die aufgezeigten Thesen zeigen darüber hinaus auch, wie künftige technologische Entwicklungen für den Fortschritt der Disziplin Wirtschaftsinformatik genutzt werden können. Technologischer Fortschritt sollte für eine kumulative Ergänzung bestehender Modelle, Werkzeuge und Methoden genutzt werden. Dagegen sind wissenschaftliche Revolutionen unabhängig vom technologischen Fortschritt.Abstract“Big data” describes technologies that promise to fulfill a fundamental tenet of research in information systems, which is to provide the right information to the right receiver in the right volume and quality at the right time. For information systems research as an application-oriented research discipline, opportunities and risks arise from using big data. Risks arise primarily from the considerable number of resources used for the explanation and design of fads. Opportunities arise because these resources lead to substantial knowledge gains, which support scientific progress within the discipline and are of relevance to practice as well.From the authors’ perspective, information systems research is ideally positioned to support big data critically and use the knowledge gained to explain and design innovative information systems in business and administration – regardless of whether big data is in reality a disruptive technology or a cursory fad. The continuing development and adoption of big data will ultimately provide clarity on whether big data is a fad or if it represents substantial progress in information systems research. Three theses also show how future technological developments can be used to advance the discipline of information systems. Technological progress should be used for a cumulative supplement of existing models, tools, and methods. By contrast, scientific revolutions are independent of technological progress.
- KonferenzbeitragBig Data-Zentren - Vorstellung und Panel(Datenbanksysteme für Business, Technologie und Web (BTW 2015), 2015) Markl, Volker; Rahm, Erhard; Lehner, Wolfgang; Beigl, Michael; Seidl, ThomasZur Erforschung der verschiedenen Facetten von Big Data“ wurden jüngst drei Zen- ” tren gegründet. Hierbei handelt es sich um die vom Bundesministerium für Bildung und Forschung (BMBF) geförderten Kompetenzzentren BBDC (Berlin Big Data Center, Leitung TU Berlin) und ScaDS (Competence Center for Scalable Data Services and Solutions, Leitung TU Dresden und Uni Leipzig) sowie das in Zusammenarbeit von Industrie und Forschung eingerichtete SDIL (Smart Data Innovation Lab, Leitung KIT). Diese drei Zentren werden zunächst in Kurzvorträgen vorgestellt. Eine sich anschließende Panel- Diskussion arbeitet Gemeinsamkeiten, spezifische Ansprüche und Kooperationsmöglichkeiten heraus.
- KonferenzbeitragBreaking the chains: on declarative analysis and independence in the big data era(Informatik 2014, 2014) Markl, VolkerData management research, systems, and technologies have drastically improved the availability of data analysis capabilities, particularly for non-experts, due in part to low-entry barriers and reduced ownership costs (e.g., for data management infrastructures and applications). Major reasons for the widespread success of database systems and today's multi-billion dollar data management market include data independence, separating physical representation and storage from the actual information, and declarative languages, separating the program specification from its intended execution environment. In contrast, today's big data solutions do not offer data independence and declarative specification. As a result, big data technologies are mostly employed in newly-established companies with IT-savvy employees or in large well-established companies with big IT departments. We argue that current big data solutions will continue to fall short of widespread adoption, due to usability problems, despite the fact that in-situ data analytics technologies achieve a good degree of schema independence. In particular, we consider the lack of a declarative specification to be a major roadblock, contributing to the scarcity in available data scientists available and limiting the application of big data to the IT-savvy industries. In particular, data scientists currently have to spend a lot of time on tuning their data analysis programs for specific data characteristics and a specific execution environment.
- TextdokumentA Comparison of Distributed Stream Processing Systems for Time Series Analysis(BTW 2019 – Workshopband, 2019) Gehring, Melissa; Charfuelan, Marcela; Markl, VolkerGiven the vast number of data processing systems available today, in this paper, we aim to identify, select, and evaluate systems to determine the one that is better suited to use in conducting time series analysis. Published studies of performance are used to compare several open-source systems, and two systems are further selected for qualitative comparison and evaluation regarding the development of a time series analytics task. The main interest of this work lies in the investigation of the Ease of development. As a test scenario, a discrete Kalman filter is implemented to predict the closing price of stock market data in real-time. Basic functionality coverage is considered, and advanced functionality is evaluated using several qualitative comparison criteria.
- KonferenzbeitragComposition methods for link discovery(Datenbanksysteme für Business, Technologie und Web (BTW) 2029, 2013) Hartung, Michael; Groß, Anika; Rahm, ErhardThe Linked Open Data community publishes an increasing number of data sources on the so-called Data Web and interlinks them to support data integration applications. We investigate how the composition of existing links and mappings can help discovering new links and mappings between LOD sources. Often there will be many alternatives for composition so that the problem arises which paths can provide the best linking results with the least computation effort. We therefore investigate different methods to select and combine the most suitable mapping paths. We also propose an approach for selecting and composing individual links instead of entire mappings. We comparatively evaluate the methods on several real-world linking problems from the LOD cloud. The results show the high value of reusing and composing existing links as well as the high effectiveness of our methods.
- ZeitschriftenartikelContinuous Training and Deployment of Deep Learning Models(Datenbank-Spektrum: Vol. 21, No. 3, 2021) Prapas, Ioannis; Derakhshan, Behrouz; Mahdiraji, Alireza Rezaei; Markl, VolkerDeep Learning (DL) has consistently surpassed other Machine Learning methods and achieved state-of-the-art performance in multiple cases. Several modern applications like financial and recommender systems require models that are constantly updated with fresh data. The prominent approach for keeping a DL model fresh is to trigger full retraining from scratch when enough new data are available. However, retraining large and complex DL models is time-consuming and compute-intensive. This makes full retraining costly, wasteful, and slow. In this paper, we present an approach to continuously train and deploy DL models. First, we enable continuous training through proactive training that combines samples of historical data with new streaming data. Second, we enable continuous deployment through gradient sparsification that allows us to send a small percentage of the model updates per training iteration. Our experimental results with LeNet5 on MNIST and modern DL models on CIFAR-10 show that proactive training keeps models fresh with comparable—if not superior—performance to full retraining at a fraction of the time. Combined with gradient sparsification, sparse proactive training enables very fast updates of a deployed model with arbitrarily large sparsity, reducing communication per iteration up to four orders of magnitude, with minimal—if any—losses in model quality. Sparse training, however, comes at a price; it incurs overhead on the training that depends on the size of the model and increases the training time by factors ranging from 1.25 to 3 in our experiments. Arguably, a small price to pay for successfully enabling the continuous training and deployment of large DL models.
- ZeitschriftenartikelDas Fachgebiet „Datenbanksysteme und Informationsmanagement“ (DIMA) an der Technischen Universität Berlin stellt sich vor(Datenbank-Spektrum: Vol. 11, No. 2, 2011) Li, Yuexiao; Markl, VolkerIn diesem Artikel stellen wir das Fachgebiet Datenbanksysteme an der Technischen Universität Berlin unter der Leitung von Prof. Dr. Volker Markl vor.Wir beschreiben aktuelle Forschungsprojekte und geben einen Überblick über die Lehrveranstaltungen des Fachgebiets.