Auflistung nach Autor:in "Thiele, Maik"
1 - 9 von 9
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragData-Warehousing 3.0 – Die Rolle von Data-Warehouse-Systemen auf Basis von In-Memory-Technologie(IMDM 2011 – Proceedings zur Tagung Innovative Unternehmensanwendungen mit In-Memory Data Management, 2011) Thiele, Maik; Lehner, Wolfgang; Habich, DirkIn diesem Beitrag widmen wir uns der Frage, welche Rolle aktuelle Trends der Hardund Software für Datenbanksysteme spielen, um als Enabler für neuartige Konzepte im Umfeld des Data-Warehousing zu dienen. Als zentraler Schritt der Evolution im Kontext des Data-Warehousing wird dabei die enge Kopplung zu operativen Systemen gesehen, um eine direkte Rückkopplung bzw. Einbettung in operationale Geschäftsprozesse zu realisieren. In diesem Papier diskutieren wir die Fragen, wie In-Memory-Technologie das Konzept von Echtzeit-DWH-Systemen unterstützt bzw. ermöglicht. Dazu stellen wir zum einen eine Referenzarchitektur für DWH-Systeme vor, die insbesondere pushund pullbasierte Datenversorgung berücksichtigt. Zum anderen diskutieren wir die konkrete Rolle von In-Memory-Systemen mit Blick auf konkrete Aspekte wie der Frage optionaler Persistenzschichten, Reduktion der Batchgröße, Positionierung von In-Memory-Techniken für den Aufbau eines Corporate Memorys und die schnelle Bereitstellung externer Datenbestände zur Unterstützung situativer BI- Szenarien.
- KonferenzbeitragDrillBeyond: open-world SQL queries using web tables(Datenbanksysteme für Business, Technologie und Web (BTW) 2050, 2013) Eberius, Julian; Thiele, Maik; Braunschweig, Katrin; Lehner, WolfgangThe Web consists of a huge number of documents, but also large amounts structured information, for example in the form of HTML tables containing relationalstyle data. One typical usage scenario for this kind of data is their integration into a database or data warehouse in order to apply data analytics. However, in today's business intelligence tools there is an evident lack of support for so-called situational or ad-hoc data integration. In this demonstration we will therefore present DrillBeyond, a novel database and information retrieval engine which allows users to query a local database as well as the web datasets in a seamless and integrated way with standard SQL. The audience will be able to pose queries to our DrillBeyond system which will be answered partly from local data in the database and partly from datasets that originate from the Web of Data. We will demonstrate the integration of the web tables back into the DBMS in order to apply its analytical features.
- ZeitschriftenartikelEchtzeit-Data-Warehouse-Systeme(Datenbank-Spektrum: Vol. 11, No. 3, 2011) Thiele, Maik; Lehner, WolfgangDie stets zentraler werdende Rolle der Data Warehouses, in allen Entscheidungsebenen eines Unternehmens, führt zu der Forderung nach hochaktuellen Daten bzw. echtzeitfähigen Data-Warehouses-Systemen. Dieser Artikel stellt die Frage inwieweit mit bestehenden Data-Warehouse-Architekturen eine Informationsversorgung in Echtzeit zu gewährleisten ist, deckt die Schwächen dieser Architekturen auf und diskutiert verschiedene Lösungsansätze.
- KonferenzbeitragEnhancing named entity extraction by effectively incorporating the crowd(Datenbanksysteme für Business, Technologie und Web (BTW) 2013 - Workshopband, 2013) Braunschweig, Katrin; Thiele, Maik; Eberius, Julian; Lehner, WolfgangNamed entity extraction is an established research area in the field of information extraction. When tailored to a specific domain and with sufficient pre-labeled training data, state-of-the-art extraction algorithms have achieved near human performance. However, when presented with semi-structured data, informal text or unknown domains where training data is not available, extraction results can deteriorate significantly. Recent research has focused on crowdsourcing as an alternative to automatic named entity extraction or as a tool to generate the required training data. While humans easily adapt to semi-structured data and informal style, a crowd-based approach also introduces new issues due to monetary costs or spamming. We address these issues by combining automatic named entity extraction algorithms with crowdsourcing into a hybrid approach. We have conducted a wide range of experiments on real world data to identify a set of subtasks or operators, that can be performed either by the crowd or automatically. Results show that a meaningful combination of these operators into complex processing pipelines can significantly enhance the quality of named entity extraction in challenging scenarios, while at the same time reducing the monetary costs of crowdsourcing and the risk of misuse.
- TextdokumentExplore FREDDY: Fast Word Embeddings in Database Systems(BTW 2019, 2019) Günther, Michael; Thiele, Maik; Lehner, Wolfgang; Yanakiev, ZdravkoWord embeddings encode a lot of semantic as well as syntactic features and therefore are useful in many tasks especially in Natural Language Processing and Information Retrieval. FREDDY (Fast woRd EmbedDings Database sYstems), an extended PostgreSQL database system, allowing the user to analyze structured knowledge in the database relations together with unstructured text corpora encoded as word embedding by introducing novel operations for similarity calculation and analogy inference. Approximation techniques support these operations to perform fast similarity computations on high-dimensional vector spaces. This demo allows exploring the powerful query capabilities of FREDDY on different database schemes and a variety of word embeddings generated on different text corpora. From a systems perspective, the user is able to examine the impact of multiple approximation techniques and their parameters for similarity search on query execution time and precision.
- TextdokumentFast Approximated Nearest Neighbor Joins For Relational Database Systems(BTW 2019, 2019) Günther, Michael; Thiele, Maik; Lehner, WolfgangK nearest neighbor search (kNN-Search) is a universal data processing technique and a fundamental operation for word embeddings trained by word2vec or related approaches. The benefits of operations on dense vectors like word embeddings for analytical functionalities of RDBMSs motivate an integration of kNN-Joins. However, kNN-Search, as well as kNN-Joins, have barely been integrated into relational database systems so far. In this paper, we develop an index structure for approximated kNN-Joins working well on high-dimensional data and provide an integration into PostgreSQL. The novel index structure is efficient for different cardinalities of the involved join partners. An evaluation of the system based on applications on word embeddings shows the benefits of such an integrated kNN-Join operation and the performance of the proposed approach.
- KonferenzbeitragMulti-objective scheduling for real-time data warehouses(Datenbanksysteme in Business, Technologie und Web (BTW) – 13. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 2009) Thiele, Maik; Bader, Andreas; Lehner, WolfgangThe issue of write-read contention is one of the most prevalent problems when deploying real-time data warehouses. With increasing load, updates are increasingly delayed and previously fast queries tend to be slowed down considerably. However, depending o
- ZeitschriftenartikelOPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data(Datenbank-Spektrum: Vol. 12, No. 2, 2012) Braunschweig, Katrin; Eberius, Julian; Thiele, Maik; Lehner, WolfgangGovernment initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness. Unfortunately, the ad-hoc analysis of this so-called Open Data can prove very complex and time-consuming, partly due to a lack of efficient system support.On the one hand, search functionality is required to identify relevant datasets. Common document retrieval techniques used in web search, however, are not optimized for Open Data and do not address the semantic ambiguity inherent in it. On the other hand, semantic integration is necessary to perform analysis tasks across multiple datasets. To do so in an ad-hoc fashion, however, requires more flexibility and easier integration than most data integration systems provide. It is apparent that an optimal management system for Open Data must combine aspects from both classic approaches.In this article, we propose OPEN, a novel concept for the management and situational analysis of Open Data within a single system. In our approach, we extend a classic database management system, adding support for the identification and dynamic integration of public datasets. As most web users lack the experience and training required to formulate structured queries in a DBMS, we add support for non-expert users to our system, for example though keyword queries. Furthermore, we address the challenge of indexing Open Data.
- ZeitschriftenartikelSeason- and Trend-aware Symbolic Approximation for Accurate and Efficient Time Series Matching(Datenbank-Spektrum: Vol. 21, No. 3, 2021) Kegel, Lars; Hartmann, Claudio; Thiele, Maik; Lehner, WolfgangProcessing and analyzing time series datasets have become a central issue in many domains requiring data management systems to support time series as a native data type. A core access primitive of time series is matching, which requires efficient algorithms on-top of appropriate representations like the symbolic aggregate approximation (SAX) representing the current state of the art. This technique reduces a time series to a low-dimensional space by segmenting it and discretizing each segment into a small symbolic alphabet. Unfortunately, SAX ignores the deterministic behavior of time series such as cyclical repeating patterns or a trend component affecting all segments, which may lead to a sub-optimal representation accuracy. We therefore introduce a novel season- and a trend-aware symbolic approximation and demonstrate an improved representation accuracy without increasing the memory footprint. Most importantly, our techniques also enable a more efficient time series matching by providing a match up to three orders of magnitude faster than SAX.