Logo des Repositoriums
 

P065 - BTW2005 - Datenbanksysteme in Business, Technologie und Web

Autor*innen mit den meisten Dokumenten  

Auflistung nach:

Neueste Veröffentlichungen

1 - 10 von 40
  • Konferenzbeitrag
    Self-extending peer data management
    (Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), 2005) Heese, Ralf; Herschel, Sven; Naumann, Felix; Roth, Armin
    Peer data management systems (PDMS) are the natural extension of integrated information systems. Conventionally, a single integrating system manages an integrated schema, distributes queries to appropriate sources, and integrates incoming data to a common result. In contrast, a PDMS consists of a set of peers, each of which can play the role of an integrating component. A peer knows about its neighboring peers by mappings, which help to translate queries and transform data. Queries submitted to one peer are answered by data residing at that peer and by data that is reached along paths of mappings through the network of peers. The only restriction for PDMS to cover unbounded data is the need to formulate at least one mapping from some known peer to a new data source. We propose a Semantic Web based method that overcomes this restriction, albeit at a price. As sources are dynamically and automatically included in a PDMS, three factors diminish quality: The new source itself might store data of poor quality, the mapping to the PDMS might be incorrect, and the mapping to the PDMS might be incomplete. To compensate, we propose a quality model to measure this effect, a cost model to restrict query planning to the best paths through the PDMS, and techniques to answer queries in such Webscale PDMS efficiently.
  • Konferenzbeitrag
    The MINERVA project: database selection in the context of P2P search
    (Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), 2005) Bender, Matthias; Michel, Sebastian; Weikum, Gerhard; Zimmer, Christian
    This paper presents the MINERVA project that protoypes a distributed search engine based on P2P techniques. MINERVA is layered on top of a Chord-style overlay network and uses a powerful crawling, indexing, and search engine on every autonomous peer. We formalize our system model and identify the problem of efficiently selecting promising peers for a query as a pivotal issue. We revisit existing approaches to the database selection problem and adapt them to our system environment. Measurements are performed to compare different selection strategies using real-world data. The experiments show significant performance differences between the strategies and prove the importance of a judicious peer selection strategy. The experiments also present first evidence that a small number of carefully selected peers already provide the vast majority of all relevant results.
  • Konferenzbeitrag
    Einen Schritt zurück zum negativen Datenbank-Caching
    (Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), 2005) Bühmann, Andreas
    Ein Schlüssel zur Erhöhung der Qualität von Web-Anwendungen ist Caching. Während das Web-Caching Dokumentfragmente bereithält, die zunehmend aus Datenbank-Daten generiert werden, richtet sich das Datenbank-Caching auf die redundante Speicherung dieser Daten selbst. Eine adaptiv verwaltete Teilmenge der Backend-Daten ermöglicht im Cache durch Vollständigkeitseigenschaften die korrekte Auswertung von Anfragen. In Cache Groups für Gleichheitsprädikate, einer Ausprägung des Constraint-basierten Datenbank-Caching, kann die Auswertbarkeit einer Anfrage durch einfache Sondierungsanfragen auf dem Cache-Inhalt entschieden werden. Wir stellen ein neues Sondierungsverfahren vor, das den Cache flexibler und für eine größere Anzahl von Anfragen nutzbar macht; dazu gehören unter dem Begriff des negativen Caching auch Anfragen mit leerem Ergebnis. Wir untersuchen, ob sich das neue Sondierungsverfahren weiter verallgemeinern lässt, welche Alternativen sich für seine Umsetzung in konkreten Cache Groups bieten und wie sich der bisherige Ansatz darin einordnet. Das neue Verfahren macht weiterhin eine Schwäche in der bisherigen Struktur von Cache Groups deutlich, die durch die Einführung von Kontrolltabellen behoben wird, und verschiebt den Schwerpunkt bei der statischen Analyse.
  • Konferenzbeitrag
    Integrating the relational interval tree into IBM's DB2 universal database server
    (Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), 2005) Brochhaus, Christoph; Enderle, Jost; Schlosser, Achim; Seidl, Thomas; Stolze, Knut
    User-defined data types such as intervals require specialized access methods to be efficiently searched and queried. As database implementors cannot provide appropriate index structures and query processing methods for each conceivable data type, present-day object-relational database systems offer extensible indexing frameworks that enable developers to extend the set of built-in index structures by custom access methods. Although these frameworks permit a seamless integration of user-defined indexing techniques into query processing they do not facilitate the actual implementation of the access method itself. In order to leverage the applicability of indexing frameworks, relational access methods such as the Relational Interval Tree (RI-tree), an efficient index structure to process interval intersection queries, mainly rely on the functionality, robustness and performance of built-in indexes, thus simplifying the index implementation significantly. To investigate the behavior and performance of the recently released IBM DB2 indexing framework we use this interface to integrate the RI-tree into the DB2 server. The standard implementation of the RI-tree, however, does not fit to the narrow corset of the DB2 framework which is restricted to the use of a single index only. We therefore present our adaptation of the originally two-tree technique to the single index constraint. As experimental results with interval intersection queries show, the plugged-in access method delivers excellent performance compared to other techniques.
  • Konferenzbeitrag
    Web data extraction for business intelligence: the lixto approach
    (Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), 2005) Baumgartner, Robert; Fröhlich, Oliver; Gottlob, Georg; Harz, Patrick; Herzog, Marcus; Lehmann, Peter
    Knowledge about market developments and competitor activities on the market becomes more and more a critical success factor for enterprises. The World Wide Web provides public domain information which can be retrieved for example from Web sites or online shops. The extraction from semi-structured information sources is mostly done manually and is therefore very time consuming. This paper describes how public information can be extracted automatically from Web sites, transformed into structured data formats, and used for data analysis in Business Intelligence systems.
  • Konferenzbeitrag
    From content management to enterprise content management
    (Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), 2005) Mega, Cataldo; Wagner, Frank; Mitschang, Bernhard
  • Textdokument
    A learning optimizer for a federated database management system
    (Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), 2005) Ewen, S.; Ortega-Binderberger, M.; Markl, V.
    Optimizers in modern DBMSs utilize a cost model to choose an efficient query execution plan (QEP) among all possible ones for a given query. The accuracy of the cost estimates depends heavily on accurate statistics about the underlying data. Outdated statistics or wrong assumptions in the underlying statistical model frequently lead to suboptimal selection of QEPs and thus to bad query performance. Federated systems require additional statistics on remote data to be kept on the federated DBMS in order to choose the most efficient execution plan when joining data from different datasources. Wrong statistics within a federated DBMS can cause not only suboptimal data access strategies but also unbalanced workload distribution as well as unnecessarily high network traffic and communication overhead. The maintenance of statistics in a federated DBMS is troublesome due to the independence of the remote DBMSs that might not expose their statistics or use different models and not collect all statistics needed by the federated DBMS. We present an approach that extends DB2s learning optimizer to automatically find flaws in statistics on remote data by extending its query feedback loop towards the federated architecture. We will discuss several approaches to get feedback from remote queries and present our solution that utilizes local query feedback and remote query feedback, and can also trigger and drive iterative sampling of remote data sources to retrieve information needed to compute statistics profiles. We provide a detailed performance study and analysis of our approach, and demonstrate in a case study a potential query execution speedup in orders of magnitude while only incurring a moderate overhead during query execution.
  • Konferenzbeitrag
    Das Common Warehouse Metamodel als Referenzmodell für Metadaten im Data Warehouse und dessen Erweiterung im SAP® Business Information Warehouse
    (Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), 2005) Hahne, Michael
    Heterogene Data Warehouse-Landschaften sind durch eine Vielzahl verschiedener Softwarekomponenten gekennzeichnet, deren Integration zu einer funktionierenden Business Intelligence-Lösung eine besondere Herausforderung darstellt. Die Metadaten der beteiligten Komponenten stellen dabei einen viel versprechenden Ansatz der effektiven und effizienten Verknüpfung dar, die aber durch die proprietären Metadatenmodelle erschwert wird. Das von der Object Management Group verabschiedete Common Warehouse Metamodel (CWM) hat sich inzwischen als industrieweiter Standard zur Modellierung von Metadaten in Data Warehouse-Systemen sowie zu deren Austausch etabliert. Die Architekturund Plattformunabhängigkeit dieses Metamodells bedingt zwar eine fehlende Berücksichtigung der spezifischen Objekte dedizierter Werkzeuge, das CWM stellt hierfür aber flexible Möglichkeiten der Erweiterung bereit. Das SAP Business Information Warehouse ist mit eines der führenden Data Warehouse-Systeme und setzt für den Austausch von Metadaten eine Erweiterung des CWM ein. In diesem spezifischen Metamodell findet sich neben der Abbildung des spezifischen Datenmodells des BW insbesondere auch die Berücksichtigung von Berechtigungen und des SAP Rollenkonzeptes.
  • Konferenzbeitrag
    The importance of being earnest about definitions
    (Datenbanksysteme in Business, Technologie und Web, 11. Fachtagung des GIFachbereichs “Datenbanken und Informationssysteme” (DBIS), 2005) Thomas, Susan
    Ideas from terminology management, the science of terms and definitions, can be used to improve the quality of software and data models, as well as to facilitate the achievement of some of the current goals of computer science like the realization of the Semantic Web and the re-use of software components. This paper explains the principles and methods of terminology management and relates it to problems in computer science. The predominant aim of the paper is to raise awareness of both the importance of definitions and the difficulties associated with them in order to pave the way for the effective use of terminology management in software projects.