Auflistung nach Autor:in "Michel, Sebastian"
1 - 9 von 9
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragAlgebraic Query Optimization for Distributed Top-k Queries(Datenbanksysteme in Business, Technologie und Web (BTW 2007) – 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 2007) Neumann, Thomas; Michel, SebastianDistributed top-k query processing is increasingly becoming an essential functionality in a large number of emerging application classes. This paper addresses the efficient algebraic optimization of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We use a dynamic programming approach to find the optimal execution plan using compact data synopses for selectivity estimation that is the basis for our cost model. The optimized query is executed in a hierarchical way involving a small and fixed number of communication phases. We have performed experiments on real web data that show the benefits of distributed top-k query optimization both in network resource consumption and query response time.
- ZeitschriftenartikelDie Arbeitsgruppen für Datenbanken und Informationssysteme an der TU Kaiserslautern(Datenbank-Spektrum: Vol. 16, No. 3, 2016) Deßloch, Stefan; Härder, Theo; Michel, SebastianIn diesem Beitrag geben wir einen Überblick über die Forschungsprojekte im Bereich Datenbanken und Informationssysteme (DBIS), die in den letzten Jahren an der TU Kaiserslautern durchgeführt wurden, bevor wir unsere aktuellen Forschungsthemen skizzieren. Desweiteren beschreiben wir unsere DBIS-bezogenen Lehraufgaben für das Bachelor- und Master-Studium, die im Lehrgebiet Informationssysteme des Fachbereichs Informatik angeboten werden.
- JournalEditorial(Datenbank-Spektrum: Vol. 18, No. 2, 2018) Michel, Sebastian; Gemulla, Rainer; Schenkel, Ralf; Härder, Theo
- KonferenzbeitragExploring Databases via Reverse Engineering Ranking Queries with PALEO(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Panev, Kiril; Michel, Sebastian; Milchevski, Evica; Pal, KoninikaA novel approach to explore databases using ranked lists is demonstrated. Working with ranked lists, capturing the relative performance of entities, is a very intuitive and widely applicable concept. Users can post lists of entities for which explanatory SQL queries and full result lists are returned. By refining the input, the results, or the queries, users can interactively explore the database content. The demonstrated system was previously presented at VLDB 2016 and is centered around our PALEO framework for reverse engineering OLAP-style database queries. How is this useful for exploring data?
- KonferenzbeitragThe MINERVA project: database selection in the context of P2P search(Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), 2005) Bender, Matthias; Michel, Sebastian; Weikum, Gerhard; Zimmer, ChristianThis paper presents the MINERVA project that protoypes a distributed search engine based on P2P techniques. MINERVA is layered on top of a Chord-style overlay network and uses a powerful crawling, indexing, and search engine on every autonomous peer. We formalize our system model and identify the problem of efficiently selecting promising peers for a query as a pivotal issue. We revisit existing approaches to the database selection problem and adapt them to our system environment. Measurements are performed to compare different selection strategies using real-world data. The experiments show significant performance differences between the strategies and prove the importance of a judicious peer selection strategy. The experiments also present first evidence that a small number of carefully selected peers already provide the vast majority of all relevant results.
- KonferenzbeitragReverse Engineering Top-k Join Queries(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Panev, Kiril; Weisenauer, Nico; Michel, SebastianRanked lists have become a fundamental tool to represent the most important items taken from a large collection of data. Search engines, sports leagues and e-commerce platforms present their results, most successful teams and most popular items in a concise and structured way by making use of ranked lists. This paper introduces the PALEO-J framework which is able to reconstruct top-k database queries, given only the original query output in the form of a ranked list and the database itself. The query to be reverse engineered may contain a wide range of aggregation functions and an arbitrary amount of equality joins, joining several database relations. The challenge of this work is to reconstruct complex queries as fast as possible while operating on large databases and given only the little amount of information provided by the top-k list of entities serving as input. The core contribution is identifying the join predicates in reverse engineering top-k OLAP queries. Furthermore we introduce several optimizations and an advanced classification system to reduce the execution time of the algorithm. Experiments conducted on a large database show the performance of the presented approach and confirm the benefits of our optimizations.
- ZeitschriftenartikelSequoia—An Approach to Declarative Information Retrieval(Datenbank-Spektrum: Vol. 12, No. 2, 2012) Pinkel, Christoph; Alvanaki, Foteini; Michel, SebastianIn this work, we propose an approach that allows to query heterogeneous data sources on the Web in a declarative fashion. Such an approach gives means for a generic way to formulate various information needs, much more powerful than simple keyword queries. Particularly appealing is the ability to combine (join) information from different sources and the ability to compute simple statistics that can be used to select promising information pieces. What might sound like a hopeless effort due to the inherent complexity expressible by SQL-style queries is at second glance not complicated to understand and to use. Already very simple combinations (i.e., joins) of different data sources (i.e., tables) offer a surprisingly large set of interesting use cases. In particular, using sliding window joins that limit the scope of interest to recent information, obtained, for instance, from the live stream of Twitter Tweets. This goes far beyond keyword queries enriched with operators like allintext: or allintitle: or site:, as can be used, for instance, in the Google search engine.
- KonferenzbeitragTop-k aggregation queries in large-scale distributed systems(Datenbanksysteme in Business, Technologie und Web (BTW) – 13. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 2009) Michel, SebastianDistributed top-k query processing has become an essential functionality in a large number of emerging application classes like Internet traffic monitoring and Peer-to-Peer Web search. This work addresses efficient algorithms for distributed top- k querie
- KonferenzbeitragTracking hot-k items over web 2.0 streams(Datenbanksysteme für Business, Technologie und Web (BTW), 2011) Haghani, Parisa; Michel, Sebastian; Aberer, KarlThe rise of the Web 2.0 has made content publishing easier than ever. Yesterday's passive consumers are now active users who generate and contribute new data to the web at an immense rate. We consider evaluating data driven aggregation queries which arise in Web 2.0 applications. In this context, each user action is interpreted as an event in a corresponding stream e.g., a particular weblog feed, or a photo stream. The presented approach continuously tracks the most popular tags attached to the incoming items and based on this, constructs a dynamic top-k query. By continuous evaluation of this query on the incoming stream, we are able to retrieve the currently hottest items. To limit the query processing cost, we propose to pre-aggregate index lists for parts of the query which are later on used to construct the full query result. As it is prohibitively expensive to materialize lists for all possible combinations, we select those tag sets that are most beneficial for the expected performance gain, based on predictions leveraging traditional FM sketches. To demonstrate the suitability of our approach, we perform a performance evaluation using a real-world dataset obtained from a weblog crawl.