Datenbank Spektrum 12(3) - November 2012

    Data Management Challenges in Next Generation Sequencing
    (Datenbank-Spektrum: Vol. 12, No. 3, 2012) Wandelt, Sebastian; Rheinländer, Astrid; Bux, Marc; Thalheim, Lisa; Haldemann, Berit; Leser, Ulf
    Since the early days of the Human Genome Project, data management has been recognized as a key challenge for modern molecular biology research. By the end of the nineties, technologies had been established that adequately supported most ongoing projects, typically built upon relational database management systems. However, recent years have seen a dramatic increase in the amount of data produced by typical projects in this domain. While it took more than ten years, approximately three billion USD, and more than 200 groups worldwide to assemble the first human genome, today’s sequencing machines produce the same amount of raw data within a week, at a cost of approximately 2000 USD, and on a single device. Several national and international projects now deal with (tens of) thousands of genomes, and trends like personalized medicine call for efforts to sequence entire populations. In this paper, we highlight challenges that emerge from this flood of data, such as parallelization of algorithms, compression of genomic sequences, and cloud-based execution of complex scientific workflows. We also point to a number of further challenges that lie ahead due to the increasing demand for translational medicine, i.e., the accelerated transition of biomedical research results into medical practice.
    Der 175. Datenbankstammtisch an der HTW Dresden
    (Datenbank-Spektrum: Vol. 12, No. 3, 2012) Wloka, Uwe; Gräfe, Gunter
    Bericht über den 24. GI-Workshop “Grundlagen von Datenbanken”
    (Datenbank-Spektrum: Vol. 12, No. 3, 2012) Schmitt, Ingo; Höpfner, Hagen
    CityPlot: Colored ER Diagrams to Visualize Structure and Contents of Databases
    (Datenbank-Spektrum: Vol. 12, No. 3, 2012) Dugas, Martin; Vossen, Gottfried
    CityPlot generates an extended version of a traditional entity-relationship diagram for a database. It is intended to provide a combined view of database structure and contents. The graphical output resembles the metaphor of a city. Data points are visualized according to data type and completeness. An open source reference implementation is available from http://cran.r-project.org/.
    XPath and XQuery Full Text Standard and Its Support in RDBMSs
    (Datenbank-Spektrum: Vol. 12, No. 3, 2012) Petković, Dušan
    Full-text queries search information in documents using words and several special operators. Therefore, the search process operates on documents, and returning results are again documents. XQuery Full Text search provides a narrower form of full-text queries, while the search process operates on parts of XML documents, such as elements and attributes.In this paper, we discuss first XPath and XQuery Full Text specified in the W3C standard. Features, described in the standard, are separated in several groups and described using examples. After that, we show to what extent these features are supported in RDBMSs. The summary of the paper lists semantic similarities and differences of Full Text features described in the standard on one side and features, which are supported in the recent versions of IBM DB2, Oracle and MS SQL Server on the other. This part of the paper shows also that the syntax of the features described in the standard has not been supported by any of the considered systems.
    Towards Large-Scale Meteorological Data Services: A Case Study
    (Datenbank-Spektrum: Vol. 12, No. 3, 2012) Misev, Dimitar; Baumann, Peter; Seib, Jürgen
    Meteorological data contribute significantly to “Big Data”, handling multi-dimensional raster data cubes up to 5-D and with single cubes up to multi-Petabyte sizes. Due to the lack of support for raster data, traditionally file-based implementations have been used for serving such data to the community, rather than databases. Array databases overcome this by providing storage and query support.In this paper, we present a case study conducted by Deutscher Wetterdienst (DWD) where extraction and processing of gridded meteorological data sets has been investigated hands-on. Following a brief introduction of the rasdaman DBMS used, we present the database schema used and a series of array queries, selected according to their practical importance in weather forecast services. We discuss several issues that have come up, such as null values and time modeling, and how they have been addressed. To the best of our knowledge, this is the first non-academic deployment of an array database for up to 5-D data sets.
    Scientific Workflows and Provenance: Introduction and Research Opportunities
    (Datenbank-Spektrum: Vol. 12, No. 3, 2012) Cuevas-Vicenttín, Víctor; Dey, Saumen; Köhler, Sven; Riddle, Sean; Ludäscher, Bertram
    Scientific workflows are becoming increasingly popular for compute-intensive and data-intensive scientific applications. The vision and promise of scientific workflows includes rapid, easy workflow design, reuse, scalable execution, and other advantages, e.g., to facilitate “reproducible science” through provenance (e.g., data lineage) support. However, as described in the paper, important research challenges remain. While the database community has studied (business) workflow technologies extensively in the past, most current work in scientific workflows seems to be done outside of the database community, e.g., by practitioners and researchers in the computational sciences and eScience. We provide a brief introduction to scientific workflows and provenance, and identify areas and problems that suggest new opportunities for database research.