Autor*innen mit den meisten Dokumenten
Neueste Veröffentlichungen
- ZeitschriftenartikelThe Data Mining Group at University of Vienna(Datenbank-Spektrum: Vol. 20, No. 1, 2020) Altinigneli, Can; Bauer, Lena Greta Marie; Behzadi, Sahar; Fritze, Robert; Hlaváčková-Schindler, Kateřina; Leodolter, Maximilian; Miklautz, Lukas; Perdacher, Martin; Sadikaj, Ylli; Schelling, Benjamin; Plant, ClaudiaHow can we extract meaningful knowledge from massive amounts of data? The data mining group at University of Vienna contributes novel methods for exploratory data analysis. Our main research focus is on unsupervised learning, where we want to identify any kind of non-random structure or patterns in the data without restricting ourselves to a pre-defined target variable or analysis goal. Our major lines of current research are clustering, causality detection and highly efficient exploratory data analysis on massive data. Besides that, we develop application-specific methods addressing specific challenges in biomedicine, neuroscience and environmental sciences. In teaching, we offer fundamental and advanced courses in data mining, machine learning and scientific data management for Bachelor and Master students of computer science and related programs.
- ZeitschriftenartikelEvaluation Infrastructures for Academic Shared Tasks(Datenbank-Spektrum: Vol. 20, No. 1, 2020) Schaible, Johann; Breuer, Timo; Tavakolpoursaleh, Narges; Müller, Bernd; Wolff, Benjamin; Schaer, PhilippAcademic search systems aid users in finding information covering specific topics of scientific interest and have evolved from early catalog-based library systems to modern web-scale systems. However, evaluating the performance of the underlying retrieval approaches remains a challenge. An increasing amount of requirements for producing accurate retrieval results have to be considered, e.g., close integration of the system’s users. Due to these requirements, small to mid-size academic search systems cannot evaluate their retrieval system in-house. Evaluation infrastructures for shared tasks alleviate this situation. They allow researchers to experiment with retrieval approaches in specific search and recommendation scenarios without building their own infrastructure. In this paper, we elaborate on the benefits and shortcomings of four state-of-the-art evaluation infrastructures on search and recommendation tasks concerning the following requirements: support for online and offline evaluations, domain specificity of shared tasks, and reproducibility of experiments and results. In addition, we introduce an evaluation infrastructure concept design aiming at reducing the shortcomings in shared tasks for search and recommender systems.
- ZeitschriftenartikelData Lakes auf den Grund gegangen(Datenbank-Spektrum: Vol. 20, No. 1, 2020) Giebler, Corinna; Gröger, Christoph; Hoos, Eva; Eichler, Rebecca; Schwarz, Holger; Mitschang, BernhardUnternehmen stehen zunehmend vor der Herausforderung, große, heterogene Daten zu verwalten und den darin enthaltenen Wert zu extrahieren. In den letzten Jahren kam darum der Data Lake als neuartiges Konzept auf, um diese komplexen Daten zu verwalten und zu nutzen. Wollen Unternehmen allerdings einen solchen Data Lake praktisch umsetzen, so stoßen sie auf vielfältige Herausforderungen, wie beispielsweise Widersprüche in der Definition oder unscharfe und fehlende Konzepte. In diesem Beitrag werden konkrete Projekte eines global agierenden Industrieunternehmens genutzt, um bestehende Herausforderungen zu identifizieren und Anforderungen an Data Lakes herzuleiten. Diese Anforderungen werden mit der verfügbaren Literatur zum Thema Data Lake sowie mit existierenden Ansätzen aus der Forschung abgeglichen. Die Gegenüberstellung zeigt, dass fünf große Forschungslücken bestehen: 1. Unklare Datenmodellierungsmethoden, 2. Fehlende Data-Lake-Referenzarchitektur, 3. Unvollständiges Metadatenmanagementkonzept, 4. Unvollständiges Data-Lake-Governance-Konzept, 5. Fehlende ganzheitliche Realisierungsstrategie.
- ZeitschriftenartikelEditorial(Datenbank-Spektrum: Vol. 20, No. 1, 2020) Schaer, Philipp; Berberich, Klaus; Härder, Theo
- ZeitschriftenartikelGridTables: A One-Size-Fits-Most H2TAP Data Store(Datenbank-Spektrum: Vol. 20, No. 1, 2020) Pinnecke, Marcus; Campero Durand, Gabriel; Broneske, David; Zoun, Roman; Saake, GunterHeterogeneous Hybrid Transactional Analytical Processing ( $$\mathrm{H}^{2}$$ H 2 TAP) database systems have been developed to match the requirements for low latency analysis of real-time operational data. Due to technical challenges, these systems are hard to architect, non-trivial to engineer, and complex to administrate. Current research has proposed excellent solutions to many of those challenges in isolation – a unified engine enabling to optimize performance by combining these solutions is still missing. In this concept paper, we suggest a highly flexible and adaptive data structure (called gridtable ) to physically organize sparse but structured records in the context of $$\mathrm{H}^{2}$$ H 2 TAP. For this, we focus on the design of an efficient highly-flexible storage layout that is built from scratch for mixed query workloads. The key challenges we address are: (1) partial storage in different memory locations, and (2) the ability to optimize for mixed OLTP-/OLAP access patterns. To guarantee safe and well-specified data definition or manipulation, as well as fast querying with no compromises on performance, we propose two dedicated access paths to the storage. In this paper, we explore the architecture and internals of gridtables showing design goals, concepts and trade-offs. We close this paper with open research questions and challenges that must be addressed in order to take advantage of the flexibility of our solution.
- ZeitschriftenartikelComparing Wizard of Oz & Observational Studies for Conversational IR Evaluation(Datenbank-Spektrum: Vol. 20, No. 1, 2020) Elsweiler, David; Frummet, Alexander; Harvey, MorganSystematic and repeatable measurement of information systems via test collections, the Cranfield model, has been the mainstay of Information Retrieval since the 1960s. However, this may not be appropriate for newer, more interactive systems, such as Conversational Search agents. Such systems rely on Machine Learning technologies, which are not yet sufficiently advanced to permit true human-like dialogues, and so research can be enabled by simulation via human agents. In this work we compare dialogues obtained from two studies with the same context, assistance in the kitchen, but with different experimental setups, allowing us to learn about and evaluate conversational IR systems. We discover that users adapt their behaviour when they think they are interacting with a system and that human-like conversations in one of the studies were unpredictable to an extent we did not expect. Our results have implications for the development of new studies in this area and, ultimately, the design of future conversational agents.
- ZeitschriftenartikelHumans Optional? Automatic Large-Scale Test Collections for Entity, Passage, and Entity-Passage Retrieval(Datenbank-Spektrum: Vol. 20, No. 1, 2020) Dietz, Laura; Dalton, JeffManually creating test collections is a time-, effort-, and cost-intensive process. This paper describes a fully automatic alternative for deriving large-scale test collections, where no human assessments are needed. The empirical experiments confirm that automatic test collection and manual assessments agree on the best performing systems. The collection includes relevance judgments for both text passages and knowledge base entities. Since test collections with relevance data for both entity and text passages are rare, this approach provides a cost-efficient way for training and evaluating ad hoc passage retrieval, entity retrieval, and entity-aware text retrieval methods.
- ZeitschriftenartikelNews(Datenbank-Spektrum: Vol. 20, No. 1, 2020)
- ZeitschriftenartikelDissertationen(Datenbank-Spektrum: Vol. 20, No. 1, 2020)
- ZeitschriftenartikelStudies on Search: Designing Meaningful IIR Studies on Commercial Search Engines(Datenbank-Spektrum: Vol. 20, No. 1, 2020) Lewandowski, Dirk; Sünkler, Sebastian; Schultheiß, SebastianThe purpose of this paper is (1) to show which topics are especially fruitful for researchers interested in user behavior in commercial search engines, (2) to help researchers decide which data to collect and to what extent. We classify potential areas for IIR research along two dimensions, namely the type of interaction data used (small-scale or large-scale), and whether search engine companies are likely to publish research on the topic chosen (likely or unlikely). This results in a framework consisting of five areas, which are further detailed. In the second part of the paper, we present some empirical studies showing how researchers could approach relevant topics where no results from the search engine providers themselves are published. We also show how researchers can improve the evidential value of their work by going from small-scale to at least medium-scale studies.