Auflistung nach Autor:in "Eberius, Julian"
1 - 4 von 4
Treffer pro Seite
Sortieroptionen
- TextdokumentDatenintegration zur Anfragezeit.(Ausgezeichnete Informatikdissertationen 2015, 2015) Eberius, JulianIn der Big-Data-Ära werden neue Daten oft in einer Geschwindigkeit gesammelt, die klassische Integration mit statischen ETL-Prozessen und globalen Schemata nicht mehr erlaubt. Diese Arbeit stellt das Prinzip der Datenintegration zur Anfragezeit vor, das darauf abzielt, zur Laufzeit einer Datenbankanfrage zusätzliche externe Datenquelle zu integrieren, und diese direkt im Anfrageergebnis darzustellen. Um dieses Ziel zu erreichen, wurde eine Reihe neuer Methoden, Algorithmen und Systeme entwickelt. An erster Stelle steht ein Top-k-Entity-Augmentation-System, das es ermöglicht, einen Datensatz ad hoc um neue Attribute zu erweitern. Darauf aufbauend wurde ein Datenbanksystem weiterentwickelt, das sogenannte Open-World-SQL-Anfragen verarbeitet, also Anfragen die über das definierte Schema hinausgehen. Die letzte Komponente ist ein Datenkurationssystem, das darauf zielt, die individuelle Nachnutzbarkeit heterogener Datenbestände für die Ad-hoc-Integration zu erhöhen, ohne jedoch ein zentrales Schema vorauszusetzen.
- KonferenzbeitragDrillBeyond: open-world SQL queries using web tables(Datenbanksysteme für Business, Technologie und Web (BTW) 2050, 2013) Eberius, Julian; Thiele, Maik; Braunschweig, Katrin; Lehner, WolfgangThe Web consists of a huge number of documents, but also large amounts structured information, for example in the form of HTML tables containing relationalstyle data. One typical usage scenario for this kind of data is their integration into a database or data warehouse in order to apply data analytics. However, in today's business intelligence tools there is an evident lack of support for so-called situational or ad-hoc data integration. In this demonstration we will therefore present DrillBeyond, a novel database and information retrieval engine which allows users to query a local database as well as the web datasets in a seamless and integrated way with standard SQL. The audience will be able to pose queries to our DrillBeyond system which will be answered partly from local data in the database and partly from datasets that originate from the Web of Data. We will demonstrate the integration of the web tables back into the DBMS in order to apply its analytical features.
- KonferenzbeitragEnhancing named entity extraction by effectively incorporating the crowd(Datenbanksysteme für Business, Technologie und Web (BTW) 2013 - Workshopband, 2013) Braunschweig, Katrin; Thiele, Maik; Eberius, Julian; Lehner, WolfgangNamed entity extraction is an established research area in the field of information extraction. When tailored to a specific domain and with sufficient pre-labeled training data, state-of-the-art extraction algorithms have achieved near human performance. However, when presented with semi-structured data, informal text or unknown domains where training data is not available, extraction results can deteriorate significantly. Recent research has focused on crowdsourcing as an alternative to automatic named entity extraction or as a tool to generate the required training data. While humans easily adapt to semi-structured data and informal style, a crowd-based approach also introduces new issues due to monetary costs or spamming. We address these issues by combining automatic named entity extraction algorithms with crowdsourcing into a hybrid approach. We have conducted a wide range of experiments on real world data to identify a set of subtasks or operators, that can be performed either by the crowd or automatically. Results show that a meaningful combination of these operators into complex processing pipelines can significantly enhance the quality of named entity extraction in challenging scenarios, while at the same time reducing the monetary costs of crowdsourcing and the risk of misuse.
- ZeitschriftenartikelOPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data(Datenbank-Spektrum: Vol. 12, No. 2, 2012) Braunschweig, Katrin; Eberius, Julian; Thiele, Maik; Lehner, WolfgangGovernment initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness. Unfortunately, the ad-hoc analysis of this so-called Open Data can prove very complex and time-consuming, partly due to a lack of efficient system support.On the one hand, search functionality is required to identify relevant datasets. Common document retrieval techniques used in web search, however, are not optimized for Open Data and do not address the semantic ambiguity inherent in it. On the other hand, semantic integration is necessary to perform analysis tasks across multiple datasets. To do so in an ad-hoc fashion, however, requires more flexibility and easier integration than most data integration systems provide. It is apparent that an optimal management system for Open Data must combine aspects from both classic approaches.In this article, we propose OPEN, a novel concept for the management and situational analysis of Open Data within a single system. In our approach, we extend a classic database management system, adding support for the identification and dynamic integration of public datasets. As most web users lack the experience and training required to formulate structured queries in a DBMS, we add support for non-expert users to our system, for example though keyword queries. Furthermore, we address the challenge of indexing Open Data.