Logo des Repositoriums
 

Datenbank Spektrum 12(2) - Juli 2012

Autor*innen mit den meisten Dokumenten  

Auflistung nach:

Neueste Veröffentlichungen

1 - 10 von 10
  • Zeitschriftenartikel
    Editorial
    (Datenbank-Spektrum: Vol. 12, No. 2, 2012) Balke, Wolf-Tilo
  • Zeitschriftenartikel
    OPEN—Enabling Non-expert Users to Extract, Integrate, and Analyze Open Data
    (Datenbank-Spektrum: Vol. 12, No. 2, 2012) Braunschweig, Katrin; Eberius, Julian; Thiele, Maik; Lehner, Wolfgang
    Government initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness. Unfortunately, the ad-hoc analysis of this so-called Open Data can prove very complex and time-consuming, partly due to a lack of efficient system support.On the one hand, search functionality is required to identify relevant datasets. Common document retrieval techniques used in web search, however, are not optimized for Open Data and do not address the semantic ambiguity inherent in it. On the other hand, semantic integration is necessary to perform analysis tasks across multiple datasets. To do so in an ad-hoc fashion, however, requires more flexibility and easier integration than most data integration systems provide. It is apparent that an optimal management system for Open Data must combine aspects from both classic approaches.In this article, we propose OPEN, a novel concept for the management and situational analysis of Open Data within a single system. In our approach, we extend a classic database management system, adding support for the identification and dynamic integration of public datasets. As most web users lack the experience and training required to formulate structured queries in a DBMS, we add support for non-expert users to our system, for example though keyword queries. Furthermore, we address the challenge of indexing Open Data.
  • Zeitschriftenartikel
    Sequoia—An Approach to Declarative Information Retrieval
    (Datenbank-Spektrum: Vol. 12, No. 2, 2012) Pinkel, Christoph; Alvanaki, Foteini; Michel, Sebastian
    In this work, we propose an approach that allows to query heterogeneous data sources on the Web in a declarative fashion. Such an approach gives means for a generic way to formulate various information needs, much more powerful than simple keyword queries. Particularly appealing is the ability to combine (join) information from different sources and the ability to compute simple statistics that can be used to select promising information pieces. What might sound like a hopeless effort due to the inherent complexity expressible by SQL-style queries is at second glance not complicated to understand and to use. Already very simple combinations (i.e., joins) of different data sources (i.e., tables) offer a surprisingly large set of interesting use cases. In particular, using sliding window joins that limit the scope of interest to recent information, obtained, for instance, from the live stream of Twitter Tweets. This goes far beyond keyword queries enriched with operators like allintext: or allintitle: or site:, as can be used, for instance, in the Google search engine.
  • Zeitschriftenartikel
    Information Extraction Meets Crowdsourcing: A Promising Couple
    (Datenbank-Spektrum: Vol. 12, No. 2, 2012) Lofi, Christoph; Selke, Joachim; Balke, Wolf-Tilo
    Recent years brought tremendous advancements in the area of automated information extraction. But still, problem scenarios remain where even state-of-the-art algorithms do not provide a satisfying solution. In these cases, another aspiring recent trend can be exploited to achieve the required extraction quality: explicit crowdsourcing of human intelligence tasks. In this paper, we discuss the synergies between information extraction and crowdsourcing. In particular, we methodically identify and classify the challenges and fallacies that arise when combining both approaches. Furthermore, we argue that for harnessing the full potential of either approach, true hybrid techniques must be considered. To demonstrate this point, we showcase such a hybrid technique, which tightly interweaves information extraction with crowdsourcing and machine learning to vastly surpass the abilities of either technique.
  • Zeitschriftenartikel
    Fact-Aware Document Retrieval for Information Extraction
    (Datenbank-Spektrum: Vol. 12, No. 2, 2012) Boden, Christoph; Löser, Alexander; Nagel, Christoph; Pieper, Stephan
    Exploiting textual information from large document collections such as the Web with structured queries is an often requested, but still unsolved requirement of many users. We present BlueFact, a framework for efficiently retrieving documents containing structured, factual information from a full-text index. This is an essential building block for information extraction systems that enable ad-hoc analytical queries on unstructured text data as well as knowledge harvesting in a digital archive scenario.Our approach is based on the observation that documents share a set of common grammatical structures and words for expressing facts. Our system observes these keyword phrases using structural, syntactic, lexical and semantic features in an iterative, cost effective training process and systematically queries the search engine index with these automatically generated phrases. Next, BlueFact retrieves a list of document identifiers, combines observed keywords as evidence for a factual information and infers the relevance for each document identifier. Finally, we forward the documents in the order of their estimated relevance to an information extraction service. That way BlueFact can efficiently retrieve all the structured, factual information contained in an indexed collection of text documents.We report results of a comprehensive experimental evaluation over 20 different fact types on the Reuters News Corpus Volume I (RCV1). BlueFact’s scoring model and feature generation methods significantly outperform existing approaches in terms of fact retrieval performance. BlueFact fires significantly fewer queries against the index, requires significantly less execution time and achieves very high fact recall across different domains.
  • Zeitschriftenartikel
    News
    (Datenbank-Spektrum: Vol. 12, No. 2, 2012)
  • Zeitschriftenartikel
    Introduction to Information Extraction: Basic Notions and Current Trends
    (Datenbank-Spektrum: Vol. 12, No. 2, 2012) Balke, Wolf-Tilo
    Transforming unstructured or semi-structured information into structured knowledge is one of the big challenges of today’s knowledge society. While this abstract goal is still unreached and probably unreachable, intelligent information extraction techniques are considered key ingredients on the way to generating and representing knowledge for a wide variety of applications. This is especially true for the current efforts to turn the World Wide Web being the world’s largest collection of information into the world’s largest knowledge base. This introduction gives a broad overview about the major topics and current trends in information extraction.
  • Zeitschriftenartikel
    Dissertationen
    (Datenbank-Spektrum: Vol. 12, No. 2, 2012)
  • Zeitschriftenartikel
    Die Datenbankforschungsgruppe der Technischen Universität Dresden stellt sich vor
    (Datenbank-Spektrum: Vol. 12, No. 2, 2012) Lehner, Wolfgang
  • Zeitschriftenartikel
    Verfahren zur funktionalen Ähnlichkeitssuche technischer Bauteile in 3D-Datenbanken
    (Datenbank-Spektrum: Vol. 12, No. 2, 2012) Maier, Moritz; Schulz, Jan; Thoben, Klaus-Dieter
    In diesem Artikel wird die funktionale Ähnlichkeitssuche technischer Bauteile behandelt. Ziel ist es, ein Verfahren aufzuzeigen, welches eine Ähnlichkeitssuche innerhalb einer bionischen 3D-Datenbank mit verschiedensten Strukturen ermöglicht, ohne direkt auf die Geometriedaten der einzelnen Strukturen zurückzugreifen. Vielmehr werden mechanische Funktionen der Strukturen als Ähnlichkeitskriterium genutzt. Das Verfahren stützt sich hierbei auf Wirkflächen und Punkte, welche Ein- oder Austrittspunkte von Kräften in einer Struktur darstellen. Mit Hilfe verschiedener Algorithmen wird eine automatische Ähnlichkeitssuche durchgeführt. Das Ergebnis der funktionalen Ähnlichkeitssuche ist eine visuell aufbereitete Landkarte, auf der sich funktional ähnliche Strukturen gruppiert und funktional unähnliche entfernt darstellen.